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canceling the adverse channel effects. The term pre-emphasis unit is used as another term for 
the term "equalizer unit" in the following. If a signal passes through both the transmission 
channel 13 and the pre-emphasis unit 112, the resulting signal shows no frequency dependent 
loss anymore, and the unwanted inter-symbol interference is reduced to zero in the ideal case- 
The pre-emphasis unit 112 can also be called predistortion filter. 

The pre-emphasis unit 112 comprises an input for setting parameters P. The parameters can 
also be called coefficients or weights. By means of these parameters P, the frequency 
dependent gain curve of the pre-emphasis unit 1 12 is adjusted. In particular, the pre-emphasis 
unit 112 can be a progranmiable finite impulse response (FIR) filter having a characteristic 

such as V out— cO * Vt„i+ cl * + , with cO, cl as adjustable parameters of the 

pre-emphasis unit. 

In order to transmit a pulse via a transmission channel 13 showing a frequency dependent loss 
in particular at high frequencies, the pre-emphasis unit 13 has to be programmed by 
parameters such that the initial part of the pulse to be transmitted might be amplified in order 
to amplify in particular high frequencies characterized by edges of a pulse or vice versa damp 
the amplitude of low frequency components. By applying the right parameters to the 
pre-emphasis unit, the joint frequency response of the pre-emphasis unit and the transmission 
channel can nearly be provoked to show a constant loss over the frequencies of interest. 

The sender 1 1 further comprises a bit pattern sequence generator 1 1 1 for generating bit pattern 
sequences BSP. The fact that the generator is called bit pattern generator does not limit its 
mode of operation to generate binary digits: Dependent on the way of coding that is used, the 
bit pattern generator 111 might also provide multilevel patterns or whatever reference signal is 
needed. In particular, a so called PRBS7 bit pattem sequence is generated and applied during 
the initialization phase, which PRBS7 bit pattem sequence comprises a defined sequence of 
127 bits with around the same number of "0" digits as the number of "1" digits, and which is 
transmitted repeatedly. The PRBS7 sequence is generated by means of a feedback shift 
register. Such feedback shift register is also used on the receiver's side for receiving the 
PRBS7 sequence. These registers are not shown in detail in FIG. 4. In general, every pseudo 
random bit sequence can be transmitted, as long as this sequence is known at the receiver. 
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METHOD OF DESIGNING AGONISTS AND ANTAGONISTS TO IGF RECEPTOR 

Field of the InventioD 

This invention relates to the field of receptor structure and 
receptor/ligand interactions. In particular it relates to the field of using 
5 receptor structure to predict the structiue of related receptors and to the use 
of the determined structures and predicted structures to select and screen for 
agonists and antagonists of the polypeptide ligands. 
Background of the Invention 

Insulin is the peptide hormone that regulates glucose uptake and 

10 metabolism. The two types of diabetes mellitus are associated either with an 
inability to produce insulin because of destruction of the pancreatic islet 
cells (Homo-Delarche, F. & Boitard, C.,1996, Immunol. Today 10: 456-460) or 
with poor glucose metabolism resulting from either insulin resistance at the 
target tissues, or from inadequate insulin secretion by the islets or faulty liver 

15 function [Taylor, S. L, et al., 1994, Diabetes, 43: 735-740). 

Insulin-like growth factors- 1 and 2 (IGF-1 and 2) are structurally 
related to insulin, but are more important in tissue growth and development 
than in metabolism. They are primarily produced in the liver in response to 
growth hormone, but are also produced in most other tissues, where they 

20 function as paracrine/autocrine regulators. The IGFs are strong mitogens, and 
are involved in numerous physiological states and certain cancers (Baserga, 
R., 1996, TibTech 14: 150-152). 

Epidermal growth factor (EGF) is a small polypeptide cytokine that is 
unrelated to the insulin/IGF family. It stimulates marked proliferation of 

25 epithelial tissues, and is a member of a larger family of structurally-related 
cytokines, such as transforming growth factor a, amphiregulin, betacellulin, 
heparin-binding EGF and some viral gene products. Abnormal EGF family 
signalling is a characteristic of certain cancers (Soler, C. & Carpenter, G., 
1994 In Nicola, N. (ed) Guidebook to Cytokines and Their receptors", Oxford 

30 Univ. Press, Oxford, ppl94-197; Walker, F. & Burgess, A. W., 1994, In Nicola, 
N. (ed) Guidebook to Cytokines and Their receptors", Oxford Univ. Press, 
Oxford, ppl98-201). 

Each of these growth factors mediates its biological actions through 
binding to the corresponding receptor. The IR, IGF-lR and the insulin 

35 receptor-i'elated receptor (IRR), for which the ligand is not known, are closely 
related to each other, and are referred to as the insulin receptor subfamily. A 



BNSOOCID: <WO 992B347A1_I. > 

I 



wo 99/28347 



PCT/AU98/00998 



10 



15 



20 



25 



30 



35 



large body of information is now available concerning tlie primary stiucture 
of these insulin receptor subfamily members (Ebina. Y., et al., 1985 Cell 40:. 
747-758; Ullrich, A., et al., 1985, Nature 313: 756-761: Ullrich, A. et al.. 
1986. EMBO J 5: 2503-2512; Shier, P. & Watt, V. M., 1989, J. Biol. Chem. 264: 
14605-14608) and the identification of some of their functional domains (for 
reviews see De Meyts. P. 1994, Diabetologia 37: 135-148; Lee, J. & Pilch, P. 
F. 1994 Amer. J. Physiol. 266: C319-C334.; Schaffer, L. 1994, Eur. J. Biochem. 
221: 1127-1132). IGF-IR, IR and IRR are members of the tyrosine kinase 
receptor superfamily and are closely related to the epidermal growth factor 
receptor (EGFR) subfamily, with which they share significant sequence 
identity in the extiacellular region as well as in the cytoplasmic kinase 
domains (Ullrich, A. et al., 1984 Nature 309: 418-425; Ward, C. W. et al., 1995 
Proteins: Stiucture Function & Genetics 22: 141-153). Both the insulin and 
EGF receptor subfamilies have a similar arrangement of two homologous 
domains (Ll and L2) separated by a cys-rich region of approximately 160 
amino acids containing 22-24 cys residues (Bajaj. M., et al., 1987 Biochim. 
Biophys. Acta 916: 220-226; Ward, C. W. et al., 1995 Proteins: Structure 
Function & Genetics 22: 141-153). The C-terminal portion of the IGF-lR 
ectodomain (residues 463 to 906) is comprised of four domains: a connecting 
domain, two fibronectin type 3 (Fn3) repeats, and an insert domain (O'Bryan, 
J. P., et al., 1991 Mol Cell Biol 11: 5016-5031). The C-terminal portion of the 
EGFR ectodomain (residues 477-621) consists solely of a second cys-rich 
region containing 20 cys residues (Ullrich, A. et al., 1984, Nature 309- 418- 
425). 

Little is known about the secondary, tertiary and quaternary structure 
of the ectodomains of these receptor subfamilies. Unlike the members of the 
EGFR subfamily which are transmembrane monomers which dimerise on 
binding ligand, the IR subfamily members are homodimers, held together by 
disulphide bonds. The extracellular region of the IR/IGF-IR/IRR monomers 
contains an a-chain (~ 703 to 735 amino acid residues) and 192-196 residues 
of the fi-chain. There is a -23 residue transmembrane segment, followed by 
the cytoplasmic portion (354 to 408 amino acids), which contains the 
catalytic tyrosine kinase domain flanked by juxtamembrane and C-tail 
regulatory regions and is responsible for mediating all receptor-specific 
functions (White, M. F. & Kahn. C. R. 1994 J. Biol. Chem. 269: 1-4). Chemical 
analyses of the receptor suggest that the a-chains are linked to the fi-chains 
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via a single disulphide bond, with the IR dimer being formed by at least two 
a-a disulphide linkages (Finn, F. M., et al., 1990, Proc. Natl. Acad. Sci. 87: 
419-423; Chiacchia, K. B., 1991, Biochem. Biophys. Res. Commun. 176, 1178- 
1182; Schaffer, L. & Ljungqvist, L., 1992, Biochem. Biophys. Res. Comm. 189: 
5 650-653; Sparrow, L. G., et al., 1997. J. Biol. Chem. 47: 29460-29467). 

Although the three-dimensional (3D) structures of the ligands EGF, 
TGF-alpha (Hommel, U, et al., 1992, J. Mol. Biol. 227:271-282), insulin 
(Dodson, E. J., et al., 1983, Biopolymers 22:281-291), IGF-1 (Sato, A., et al., 
1993, Int J Peptide Protein Res 41:433-440) and IGF-2 (Torres, A. M., et 

10 al.,1995, J. Mol. Biol. 248:385-401) are known, and numerous analytical and 
functional studies of ligand binding to EGFR (Soler, C. & Carpenter, G., 1994 
In Nicola (ed) Guidebook to Cytokines and Their receptors", Oxford Univ. 
Press, Oxford, ppl94-197), IGF-lR and IR (see De Meyts, P., 1994 
Diabetologia, 37:135-148) have been carried out, the mechanisms of ligand 

15 binding and subsequent transmembrane signalling have not been resolved. 

Ligand-induced, receptor-mediated phosphorylation is the signalling 
mechanism by which most cytokines, polypeptide hormones and membrane- 
anchored ligands exert their biological effects. The primaiy kinase may be 
part of the intracellular portion of the transmembrane receptor protein, as in 

20 the tyrosine kinase receptors (for review see Yarden, Y., et al., 1988, Ann. 

Rev. Biochem. 57:443-478) or the Ser/Thr kinase receptors (Alevizopoulos, A. 
& Mermod, N., 1997, BioEssays. 19:581-591) or may be non-covalently 
associated with the cytoplasmic tail of the transmembrane protein(s) making 
up the receptor complex, as in the case of the haemopoietic growth factor 

25 receptors (Stahl, N., et al., 1995, Science 267:1349-1353). The end result is 
the same, ligand binding leads to receptor dimerization or oligomerization or 
a conformational change in pre-existing receptor dimers or oligomers, 
resulting in activation by transphosphorylation, of the covalently attached or 
non-covalently associated protein kinase domains (Hunter, T., 1995, Cell, 

30 80:225-236). 

Many oncogenes have been shown to be homologous to growth 
factors, growth factor receptors or molecules in the signal transduction 
pathways (Baserga, R.,1994 Cell, 79:927-930; Hunter, T., 1997 Cell, 88:333- 
346). One of the best examples is v-Erb (related to the EGFR). Since 

35 overexpression of a number of growth factor receptors results in ligand- 

dependent transformation, an alternate strategy for oncogenes is to regulate 
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the expression of growth factor receptors or their hgaiids or to directly bind 
to the receptors to stimulate the same effect (Baserga, R., 1994 Cell, 79:927- 
930). Examples are v-Src, which activates IGF-1 R intracellularly; c-Myb, 
which transforms cells by enhancing the expression of IGFIR; and SV40 T 
antigen which interacts with the IGF-lR and enhances the secretion of IGF-1 
(see Baserga, R.4994 Cell. 79:927-930 for review). Cells in which the IGF-lR 
has been disrupted or deleted cannot be transformed by SV40 T antigen. If 
oncogenes activate growth factors and their receptors, then tumour 
suppressor genes should have the opposite effect. One good example of this 
is the Wilm's tumour suppressor gene, WTl, which suppresses the expression 
of IGF-lR (Drummond. J. A., et al., 1992, Science, 257:275-277). Cells that are 
driven to proliferate by oncogenes undergo massive apoptosis when growth 
factor receptors are ablated, since, unlike normal cells, they appear unable to 
withdraw from the cell-cycle and enter into the Go phase (Baserga, R.,ig94 
15 Cell, 79:927-930). 

The insulin-like growth factor-1 receptor (IGF-lR) is one of several 
growth-factor receptors that regulate the proliferation of mammalian cells. 
However, its ubiquitousness and certain unique aspects of its function make 
IGF-lR an ideal target for specific therapeutic interventions against abnormal 
growth, with very little effect on normal cells (see Baserga, R., 1996 
TIBTECH. 14:150-152). The receptor is activated by IGFl, IGF2 and insulin, 
and plays a major role in cellular proliferation in at least three ways: it is 
essential for optimal growth of cells in vitro and in vivo; several cell types 
require IGF-lR to maintain the transformed state; and activated IGF-lR has a 
protective effect against apoptotic cell death (Baserga, R., 1996 TIBTECH, 
14:150-152). These properties alone make it an ideal target for therapeutic 
interventions. Transgenic experiments have shown that IGF-lR is not an 
absolute requirement for cell growth, but is essential for the establishment of 
the transformed state (Baserga. R.,1994 Cell, 79: 927-930). In several cases 
(human glioblastoma, human melanoma; human breast carcinoma; human 
lung carcinoma; human ovarian carcinoma; human rhabdomyosarcoma: 
mouse melanoma, mouse leukaemia; rat glioblastoma; rat 
rhabdomyosarcoma; hamster mesothelioma ) the transformed phenotype can 
be reversed by decreasing the expression of IGF-lR using antisense to IGF-lR 
(Baserga. R., 1996 TIBTECH 14:150-152); or by interfering with its function 
by antibodies to IGF-lR (human breast carcinoma; human 
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rhabdomyosarcoma) or by dominant negatives of IGF-lR (rat glioblastoma; 
Baserga, R.,1996 TIBTECH 14:150-152). 

Three effects are observed when the function of IGF-lR is impaired: 
tumoiu' cells vmdergo massive apoptosis which results in inhibition of 
5 tumourogenesis; surviving tumour cells are eliminated by a specific immune 
response; and such a host response can cause a regression of an established 
wild-type tumour (Resnicoff, M., et al., 1995, Cancer Res. 54:2218-2222). 
These effects, plus the fact that interference with IGF-lR function has a 
limited effect on normal cells (partial inhibition of growth without apoptosis) 

10 makes IGF-lR a unique target for therapeutic interventions (Baserga, R., 1996 
TIBTECH 14:150-152). In addition IGF-lR is downstream of many other 
growth factor receptors, which makes it an even more generalised target. The 
implication of these findings is that if the number of IGF-lRs on cells can be 
decreased or their function antagonised, then tumours cease to grow and can 

15 be removed immunologically. These studies establish that IGF-lR 
antagonists will be extremely important therapeutically. 

Many cancer cells have constitutively active EGFR (Sandgreen, E. P., 
et al., 1990, Cell, 61:1121-135; Karnes, W. E. J., et al., 1992, Gastroenterology, 
102:474-485) or other EGFR family members (Hines, N. E.,1993, Semin. 

20 Cancer Biol. 4:19-26). Elevated levels of activated EGFR occur in bladder, 

breast, lung and brain tumours (Harris, A. L., et al., 1989, In Furth & Greaves 
(eds) The Molecular Diagnostics of human cancer. Cold Spring Harbor Lab. 
Press, CSH, NY, pp353-357). Antibodies to EGFR can inhibit ligand activation 
of EGFR (Sato, J. D., et al., 1983 Mol. Biol. Med. 1:511-529) and the growth of 

25 many epithelial cell lines (Aboud-Pirak E., et al., 1988, J. Natl Cancer Inst. 
85:1327-1331). Patients receiving repeated doses of a humanised chimeric 
anti-EGFR monoclonal antibody showed signs of disease stabilization. The 
large doses required and the cost of production of humanised monoclonal 
antibody is likely to limit the application of this type of therapy. These 

30 findings indicate that the development of EGF antagonists will be attractive 
anticancer agents. 
Summary of the Invention 

The present inventors have now obtained 3D structural information 
concerning the insulin-like growth factor receptor (IGF-lR). This information 

35 can be used to predict the structure of related members of the insulin 
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receptor family and provides a rational basis for the development of ligands 
for specific therapeutic applications. 

Accordingly, in a first aspect the present invention provides a method 
of designing a compound able to bind to a molecule of the insulin receptor 
5 family and to modulate an activity mediated by the molecule, including the step 
of assessing the stereochemical complementarity between the compound and the 
receptor site of the molecule, wherein the receptor site includes: 

(aj amino acids 1 to 462 of the receptor for IGF-1, having the atomic 
coordinates substantially as shown in Figure 1; 
10 (b) a subset of said amino acids, or; 

(c) amino acids present in the amino acid sequence of a member of the 
insulin receptor family, which form an equivalent three-dimensional structure to 
that of the receptor molecule as depicted in Figure 1. 

The phrase "insulin receptor family" encompasses, for example, IGF- 
15 IR, IR and IRR. In general, insulin receptor family members show similar 
domain an-angements and share significant sequence identity (preferably at 
least 40% identity). 

By "stereochemical complementarity" we mean that the biologically 
active substance or a portion thereof correlates, in the manner of the classic 
20 "lock-and-key" visualisation of ligand-receptor interaction, with the groove in 
the receptor site. 

In a preferred embodiment of this aspect of the invention, the compound 
is selected or modified from a known compound identified from a database. 

In a further preferred embodiment, the compound is designed so as to 
25 complement the structure of the receptor molecule as depicted in Figure 1. 

In a further preferred embodiment, the compound has structural regions 
able to make close contact with amino acid residues at the surface of the 
receptor site lining the groove, as depicted in Figure 2. 

In a further preferred embodiment, the compound has a stereochemistry 
such that it can interact with both the Ll and L2 domains of the receptor site. 

In a further preferred embodiment, the compound has a stereochemistry 
such that it can interact with the Ll domain of a first monomer of the receptor 
homodimer, and with the L2 domain of the other monomer of the receptor 
homodimer. 

In a further preferred embodiment, the interaction of the compound 
with the receptor site alters the position of at least one of the Ll, L2 or cysteine- 
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rich domains of the receptor molecule relative to the position of at least one of 
the other of said domains. Preferably, the compound interacts with the p sheet 
of the Ll domain of the receptor molecule, thereby causing an alteration in the 
position of the Ll domain relative to the position of the cysteine-rich domain or 
5 of the L2 domain. Alternatively, the compound interacts with the receptor site 
in the region of the interface between the Ll domain an the cj^'steine-rich domain 
of the receptor molecule, thereby causing the Ll domain and the cysteine-rich 
domain to move away from each other. In another preferred embodiment, the 
compound interacts with the hinge region between the L2 domain and the 
10 cj^steine-rich domain of the receptor molecule, thereby causing an alteration in 
the positions of the L2 domain and the cysteine-rich domain relative to each 
other. 

In a further preferred embodiment, the stereochemical complementarity 
between the compound and the receptor site is such that the compound has a 
15 for the receptor side of less than lO'^M, more preferably is less than lO'^M. : 
In a further preferred embodiment or the first aspect of the present 
invention, the compound has the ability to increase an activity mediated bj'^ the 
receptor molecule. 

In a further preferred embodiment, the compound has the ability to 
20 decrease an activity mediated by the receptor molecule. Preferably, the 

stereochemical interaction between the compound and the receptor site is 
adapted to prevent the binding of a natural ligand of the receptor molecule to the 
receptor site. It is preferred that the compound has a of less than 10"^M, more 
preferably less than 10 and more preferably less than lO'^M. 
25 In a further preferred embodiment of the first aspect of tlie present 

invention, the receptor is the IGF-lR, or the insulin receptor. 

In a second aspect, the present invention provides a computer-assisted 
method for identifying potential compounds able to bind to a molecule of the 
insulin receptor family and to modulate an activity mediated by the molecule, 
30 using a programmed computer including a processor, an input device, and an 
output device, including the steps of: 

(a] inputting into the programmed computer, through the input 
device, data comprising the atomic coordinates of the IGF-lR molecule as shown 
in Figure 1 , or a subset thereof; 
35 (b) generating, using computer methods, a set of atomic coordinates of 

a structure that possesses stereochemical complementarity to the atomic 
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coordinates of the IGF-lR site as shown in Figure 1, or a subset thereof, thereby 
generating a criteria data set; 

(c) comparing, using the processor, the criteria data set to a computer 
database of chemical structures; 
5 (d) selecting from the database, using computer methods, chemical 

structures which are structurally similar to a portion of said criteria data set; and 

(e) outputting, to the output device, the selected chemical structures 
which are similar to a portion of the criteria data set. 

In a preferred embodiment of the second aspect, the programmed 
10 computer includes a data storage system which includes the dtatbase of 
chemical structures. 

In a preferred embodiment of the second aspect, the method is used to 
identify potential compounds which have the abiHty to decrease an activity 
mediated by the receptor. 

In another preferred embodiment, the computer-assisted method furdier 
includes the step of selecting one or more chemical structures from step (e] 
which interact with the receptor site of the molecule in a manner which 
prevents the binding of natural ligands to the receptor site. 

In another preferred embodiment, the computer-assisted method further 
20 includes the step of obtaining a compound with a chemical structure selected in 
steps (d) and (e), and testing the compound for the ability to decrease an activity 
mediated by the receptor. 

In a further preferred embodiment, the computer-assisted method is 
used to identify potential compounds which have the ability to increase an 
25 activity mediated by the receptor molecule. 

In another prefeiTed embodiment, the computer-assisted method further 
includes the step of obtaining a molecule with a chemical structure selected in 
steps (d) and (e), and testing the compound for the ability to increase an activity 
mediated by the receptor. 

In a further preferred embodiment of the second aspect of the present 
invention, the receptor is the IGF-IR, or the insulin receptor. 

In a third aspect, Uie present invention provides a method of screening 
of a putative compound having the ability to modulate the activity of a receptor 
of the insulin receptor family, including the steps of identifying a putative 
compound by a method according to the first or second aspects, and testing the 
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compound for the ability to increase or decrease an activity mediated by the 
receptor. 

In a preferred embodiment of the third aspect, the test is carried out in 

vitro, 

5 hi a further preferred embodiment of the third aspect, the test is a high 

throughput assay. 

In a preferred embodiment of the third aspect, the test is carried oniin 

vivo. 

10 Brief Description of the Drawings 

Figure 1. IGF-lR residues 1-462, in terms of atomic coordinates refined to a 
resolution of 2.6 A (average accuracy ^ 0.3A). The coordinates are in relation 
to a Cartesian system of orthogonal axes. 

15 

Figure 2. Depiction of the residues lining the groove of the IGF-IR receptor 
fragment 1-462. 

Figure 3. Gel filtration chromatography of affinity-purified IGF-lR/462 
20 protein. The protein was purified on a Superdex S200 column (Pharmacia) 
fitted to a BioLogic L.C. system (Biorad), equilibrated and eluted at 0.8 
ml/min with 40 mM Tris/150 mM NaCl/0.02% NaN3 adjusted to pH 8.0. 
(a) Protein eluting in peak 1 contained aggregated IGF-lR/462 protein, peak 2 
contained monomeric protein and peak 3 contained the c-myc undecapeptide 
25 used for elution from the Mab 9E10 immunoaffinity column, (b) Non- 
reduced SDS-PAGE of fraction 2 from IGF-lR/462 obtained following 
Superdex S200 (Fig. la). Standard proteins are indicated. 

Figure 4. Ion exchange chromatography of affinity-purified, truncated IGF- 
30 IR ectodomain. A mixture of gradient and isocratic elution chromatography 
was performed on a Resource Q column (Pharmacia) fitted to a BioLogic 
System (Biorad), using 20 mM Tris/pH 8.0 as buffer A and the same buffer 
containing IM NaCl as buffer B. Protein solution in TBSA was diluted at least 
1:2 with water and loaded onto the column at 2 ml/min. Elution was 
35 monitored by absorbance (280 nm) and conductivity (mS/cm). Target protein 
(peak 2) eluted isocratically with 20 mM Tris/0.14 M NaCl pH 8.0. Inset: 



BNSDOCID: <WO 992e347Al_L> 



wo 99/28347 



10 



PCT/AU98/00998 



Isoelectric focusing gel (pH 3 - 7; Novex Australia Pty Ltd) of fraction 2. The 
pi was estimated at 5.1 from standard proteins (not shown). 

Figure 5. Polypeptide fold for residues 1-462 of IGF-lR. The Ll domain is at 
the top, viewed from the N-terminal end and L2 is at the bottom. The space 
at the centre is of sufficient size to accommodate IGF-1. Helices are 
indicated by curled ribbon and b-strands by arrows. Cysteine side chains are 
drawn as ball-and-stick with lines showing disulfide bonds. The arrow 
points in the direction of view for Ll in Figure 7. 

Figure 6. Amino acid sequences of IGF-lR and related proteins, a, Ll and L2 
domains of human IGF-lR and IR are shown based on a sequence alignment 
for the two proteins and a structural alignment for the Ll and L2domains. 
Positions showing conservation physico-chemical properties of amino acids 
are boxed, residues used in the structural alignment are shown in Times 
Italic and residues which form the Trp 176 pocket are in Times Bold. 
Secondaiy structure elements for Ll (above the sequences) and L2 (below) 
are indicated as cylinders for helices and arrows for p-strands. Strands are 
shaded (pale, medium and dark grey) according to the p-sheet to which they 
belong. Disulfide bonds are also indicated, b, Cys-rich domains of human 
IGF-lR, IR and EGFR (domains 2 and 4) are aligned based on sequence and 
structural considerations. Secondaiy structural elements and disulfide bonds 
are indicated above the sequences. The dashed bond is only present in IR. 
Different types of disulfide bonded modules are labelled below the sequences 
as open, filled or broken lines. Boxed residues show consei-vation of physico- 
chemical properties and structurally conserved residues for modules 4-7 are 
shown in Times Italic. Residues from EGFR which do not conform to the 
pattern are in lowercase with probable disulfide bonding indicated below and 
the consei-ved Trp 176 and the semi-conserved Gin 182 are in Times Bold. 

Figure 7. Stereo view of a superposition of the Ll (white) and L2 (black) 
domains. Residues numbers above are for Ll and below for L2. The side 
chain of Trp 176 which protrudes into the core of Ll is drawn as ball-and- 
stick. 
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Figure 8. Schematic diagram showing the association of three p-finger 
motifs, p-strands are drawn as arrows and disulfide bonds as zigzags. 

Figure 9: Sequence ahgnment of hIGF-lR, hIR and hIRR ectodomains, 
5 derived by use of the PileUp program in the software package of the Genetics 
Computer Group, 575 Science Drive, Madison, Wisconsin, USA. 
For assignment of homologous 3D structures see Figure 6. 

Figure 10 Gel filtration chromatography of insulin receptor ectodomain 
10 and MFab complexes. hIR -11 ectodomain dimer (5 - 20 mg) was complexed 
with MFab derivatives (15-25 mg each) of the anti-hIR antibodies 18-44, 83-7 
and 83-14 (Soos et al., 1986). Flution profiles were generated from samples 
loaded on to a Superdex S200 column (Pharmacia), connected to a BioLogic 
chromatography system (Biorad) and monitored at 280 nm. The column was 
15 eluted at 0.8 ml/min with 40 mM Tris/150 mM sodium chloride/0.02% 

sodium azide buffer adjusted to pH 8.0: Profile 0, hIR -11 ectodomain, Profile 
1, ectodomain mixed with MFab 18-44; Profile 2 , ectodomain mixed with 
MFabl8-44 and MFab 83-14; Profile 3, ectodomain mixed with MFab 18-44, 
MFab 83-14 and MFab 83-7. The apparent mass of each complex was 
20 determined from a plot of the following standard proteins: thyroglobulin (660 
kDa), ferritin (440 kDa), bovine gamma globulin (158 kDa), bovine serum 
albumin (67 kDa), chicken ovalbumin (44 kDa) and equine myoglobin (17 
kDa). 

25 Figure 11 Schematic representations of electron microscopy images of the 
hIR ectodomain dimer. 

Detailed Description of the Invention 

We describe herein the expression, purification, and crystallization of 
30 a recombinant truncated IGF-lR fragment (residues 1-462) containing the Ll- 
cysteine-rich-L2 region of the ectodomain. The selected truncation position is 
just downstream of the exon 6/exon 7 junction (Abbott, A. M., et al., 1992. J 
Biol Chem., 267:10759-10763), and occurs at a position where the sequences 
of tlie IR and EGFR families diverge markedly (Ward, C. W., et al.,1995, 
35 Proteins: Struct, Funct., Genet. 22:141-153; Lax, L, et al., 1988, Molec. 

Cellul. Biol. 8:1970-1978) suggesting it represents a domain boundaiy. To 
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limit the effects of glycosylatioii, the IGF-lR fragment was expressed in Lec8 
cells, a glycosylation mutant of Chinese hamster ovary (CHO) cells, whose 
defined glycosylation defect produces N-linked oligosaccharides truncated at 
N-acetyl glucosamine residues distal to mannose residues (Stanley, P. 1989, 
Molec. Cellul. Biol. 9:377-383). Such an approach has facilitated glycoprotein 
ciystallization (Davis, S. J., et al., 1993. Protein Eng. 6:229-232: Liu. J., et al.. 
1996, J. Biol. Chem. 271:33639-33646). 

The IGF-IR construct described herein includes a c-myc peptide tag 
(Hoogenboom, H. R., et al.,1991, Nucleic Acids Res. 19:4133-4137) that is 
recognised by the Mab 9E10 (Evan, G. I., et al.. 1985, Mol. Cell. Biol. 5:3610- 
3616) enabling the expressed product to be purified by peptide elution from 
an antibody affinity column followed by gel filtration over Superdex S200. 
The purified proteins crystallized under a sparse matrix screen (Jancarik, J. & 
Kim, S.-H., 1991, J. Appl. Cryst. 24:409-411) but the crystals were of variable 
quality, with the best diffracting to 3.0-3.5A. Isocratic gradient elution by 
anion-exchange chromatography yielded protein that was less heterogenous 
and gave crystals of sufficient quality to determine the structure of the first 
three domains of the human IGF-lR. 

The IGF-lR fragment consisted of residues 1-462 of IGF-lR linked via 
an enterokinase-cleavable pentapeptide sequence to an eleven residue c-myc 
peptide tag at the C-terminal end. The fragment was expressed in Lec8 cells 
by continuous media perfusion in a bioreactor using porous carrier disks. It 
was secreted into the culture medium and purified by peptide elution from 
an anti-c-myc antibody column followed by Superdex S200 gel filtration. The 
receptor fragment bound two anti-IGF-lR monoclonal antibodies, 24-31 and 
24-60, which recognize conformational epitopes, but could not be shown to 
bind IGF-1 or IGF-2. Crystals of variable quality were grown as rhombic 
prisms in 1.7 M ammonium sulfate at pH 7.5 with the best diffracting to 3.0- 
3.5 A. Further purification by isocratic elution on an anion-exchange column 
gave protein which produced better quality crj^stals, diffracting to 2.6 A, that 
were suitable for X-ray sti-ucture determination. 

The structure of this fragment (IGF-lR residues 1-462; Ll-cys rich-L2 
domains) has been determined to 2.6 A resolution by X-ray diffraction. The L 
domains each adopt a compact shape consisting of a single stranded right- 
handed P-helix. The cys-rich region is composed of eight disulphide-bonded 
modules, seven of which form a rod-shaped domain with modules associated 



BNSDOCID: <:WO 9928347A1 t > 



wo 99/28347 



PCT/AU98/00998 



13 

in a novel manner. At the centre of this reasonably extended structure is a 
space, bounded by all three domains, and of sufficient size to accommodate a 
ligand molecule. Functional studies on IGF-lR and other members of the 
insulin receptor family show that the regions primarily responsible for 
5 hormone-binding map to this central site. Thus this structure gives a first 
view of how members of the insulin receptor family might interact with their 
ligands. 

Another group has reported the crystallization of a related receptor, 
the EGFR, in a complex with its ligand EGF (Weber, W., at al., 1994, J 

10 Chromat. 679:181-189). However, difficulties were encountered with these 
crystals which diffracted to only 6 A, insufficient for the determination of an 
atomic resolution structure of this complex (Weber, W., et al., 1994, J 
Chromat 679:181-189) or the generation of accurate models of structurally 
related receptor domains such as IGF-lR and IR by homology modelling. 

15 The present inventors have developed 3D structural information 

about cytokine receptors in order to enable a more accurate understanding of 
how the binding of ligand leads to signal transduction. Such information 
provides a rational basis for the development of ligands for specific 
therapeutic applications, something that heretofore could not have been 

20 predicted de novo from available sequence data. 

The precise mechanisms underlying the binding of agonists and 
antagonists to the IGF-lR site are not fully clarified. However, the binding of 
ligands to the receptor site, preferably with an affinity in the order of lO'^M 
or higher, is understood to arise from enhanced stereochemical 

25 complementarity relative to naturally occurring IGF-1 ligands. 

Such stereochemical complementarity, pursuant to the present 
invention, is characteristic of a molecule that matches intra-site surface 
residues lining the groove of the receptor site as eneumerated by the 
coordinates set out in Figure 1. The residues lining the groove are depicted 

30 in Figure 2. By "match" we mean that the identified portions interact with 
the surface residues, for example, via hydrogen bonding or by enthalpy- 
reducing Van der Waals interactions which promote desolvation of the 
biologically active substance within the site, in such a way that retention of 
the biologically active substance within the groove is favoured energetically. 

35 Substances which are complemetary to the shape of the receptor site 

characterised by amino acids positioned at atomic coordinates set out in 



BNSDOCID: <WO. 



.9928347A1_I_> 



wo 99/28347 



PCT/AU98/0(»98 



14 



10 



15 



20 



25 



30 



35 



Figure 1 may be able to bind to the receptor site and, when the binding is 
sufficiently stiong, substantially prohibit binding of the naturally occurring 
ligands to the site. 

It will be appreciated that it is not necessary that the 
complementarity between ligands and the receptor site extend over all 
residues lining the groove in order to inhibit binding of the natural ligand. 
Accordingly, agonists or antagonists which bind to a portion of the residues 
lining the groove are encompassed by the present invention. 

In general, the design of a molecule possessing stereochemical 
complementarity can be accomplished by means of techniques that optimize, 
either chemically or geometrically, the "fit" between a molecule and a target ' 
receptor. Known techniques of this sort are reviewed by Sheridan and 
Venkataraghavan. Acc. Chem Res. 1987 20 322; Goodford, J. Med. Chem. 
1984 27 557; Beddell, Chem. Soc. Reviews 1985, 279; Hoi, Angew. Chem. 
1986 25 767 and Verlinde C.L.M.J & Hoi, W.G.J. Structure 1994, 2. 577, the 
respective contents of which are hereby incorporated by reference. See also 
Blundell et al., Nature 1987 326 347 (drug development based on information 
regarding receptor structure). 

Thus, there are two preferred approaches to designing a molecule, 
according to the present invention, that complements the shape of IGF-lR or 
a related receptor molecule. By the geometric approach, the number of 
internal degrees of freedom (and the corresponding local minima in the 
molecular conformation space) is reduced by considering only the geometric 
(hard-sphere) interactions of two rigid bodies, where one body (the active 
site) contains "pockets" or "grooves" that form binding sites for the second 
body (the complementing molecule, as ligand). The second preferred 
approach entails an assessment of the interaction of respective chemical 
groups ("probes") with the active site at sample positions within and around 
the site, resulting in an array of energy values from which three-dimensional 
contour surfaces at selected energy levels can be generated. 

The geometric approach is illustrated by Kuntz et al., J. Mol. Biol. 
1982 161 269, the contents of which are hereby incorporated by reference 
whose algorithm for ligand design is implemented in a commercial softu^are 
package distributed by the Regents of the University of California and further 
described in a document, provided by the distributor, which is entitled 
"Overview of the DOCK Package, Version 1.0.", the contents of which are 
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hereby incorporated by reference. Pursuant to the Kuntz algorithm, the 
shape of the cavity represented by the IGF-Rl site is defined as a series of 
overlapping spheres of different radii. One or more extant data bases of 
crystallographic data, such as the Cambridge Structiu'al Database System 
5 maintained by Cambridge University (University Chemical Laboratory, 
Lensfield Road, Cambridge CB2 lEW, U.K.) and the Protein Data Bank 
maintained by Brookhaven National Laboratory (Chemistry Dept. Upton, NY 
11973, U.S.A.), is then searched for molecules which approximate the shape 
thus defined. 

10 Molecules identified in this way, on the basis of geometric 

parameters, can then be modified to satisfy criteria associated with chemical 
complementarity, such as hydrogen bonding, ionic interactions and Van der 
Waals interactions. 

The chemical-probe approach to ligand design is described, for 

15 example, by Goodford, J. Med. Chem. 1985 28 849, the contents of which are 
hereby incorporated by reference, and is implemented in several commercial 
software packages, such as GRID (product of Molecular Discoveiy Ltd., West 
Way House, Elms Parade, Oxford OX2 9LL, U.K.). Pursuant to this approach, 
the chemical prerequisites for a site-complementing molecule are identified 

20 at the outset, by probing the active site (as represented via the atomic 

coordinates shown in Fig. 1) with different chemical probes, e.g., water, a 
method group, an amine nitrogen, a carboxyl oxygen, and a hydroxyl. 
Favored sites for interaction between the active site and each probe are thus 
determined, and from the resulting three-dimensional pattern of such sites a 

25 putative complementary molecule can be generated. 

The chemical-probe approach is especially useful in defining variants 
of a molecule known to bind the target receptor. Accordingly, 
crystallographic analysis of IGF-1 bound to the receptor site is expected to 
provide useful information regarding the interaction between the archetype 

30 ligand and the active site of interest. 

Programs suitable for searching three-dimensional databases to 
identify molecules bearing a desired pharmacophore include: MACCS-3D and 
ISIS/3D (Molecular Design Ltd., San Leandro, CA), ChemDBS-3D (Chemical 
Design Ltd., Oxford, U.K.), and Sybyl/3DB Unity (Tripos Associates, St. 

35 Louis, MO). 
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Programs suitable for pharmacophore selection and design include: 
DISCO (Abbott Laboratories, Abbott Park, IL), Catalyst (Bio-CAD Corp., 
Mountain View, CA), and ChemDBS-3D (Chemical Design Ltd., Oxford, 
U.K.). 

Databases of chemical structures are available from a number of 
sources including Cambridge Ciystallographic Data Centre (Cambridge, U.K.) 
and Chemical Abstracts Sen'ice (Columbus, OH). 

De novo design programs include Ludi (Biosym Technologies Inc., 
San Diego, CA), Sybyl (Tripos Associates) and Aladdin (Daylight Chemical 
Information Systems, Ii-vine, CA). 

Those skilled in the art will recognize that the design of a mimetic 
may require slight structural alteration or adjustment of a chemical structure 
designed or identified using the methods of the invention. 

The invention may be implemented in hardware or software, or a 
combination of both. However, preferably, the invention is implemented in 
computer programs executing on programmable computers each comprising 
a processor, a data storage system (including volatile and non-volatile 
memoiy and/or storage elements), at least one input device, and at least one 
output device. Program code is applied to input data to perform the 
functions described above and generate output information. The output 
information is applied to one or more output devices, in known fashion. The 
computer may be, for example, a personal computer, microcomputer, or 
workstation of conventional design. 

Each program is preferably implemented in a high level procedural or 
object-oriented programming language to communicate with a computer 
system. However, the programs can be implemented in assembly or machine 
language, if desired. In any case, the language may be compiled or 
interpreted language. 

Each such computer program is preferably stored on a storage 
medium or device (e.g., ROM or magnetic diskette) readable by a general or 
special purpose programmable computer, for configuring and operating the 
computer when the storage media or device is read by the computer to 
perform the procedures described herein. The inventive system may also be 
considered to be implemented as a computer-readable storage medium, 
configured with a computer program, where the storage medium so 
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configured causes a computer to operate in a specific and predefined manner 
to perform the functions described herein. 

Compounds designed according to the methods of the present 
invention may be assessed by a number of in vitro and in vivo assays of 
5 hormone function. For example, the identification of IGF-IR antagonists of 
may be undertaken using a solid-phase receptor binding assay. Potential 
antagonists may be screened for their ability to inhibit the binding of 
europium-labelled IGF ligands to soluble, recombinant IGF-lR in a 
microplate-based format. Europium is a lanthanide fluorophore, the presence 

10 of which can be measured using time-resolved fluorometry. The sensitivity of 
this assay matches that achieved by radioisotopes, measurement is rapid and 
is performed in a microplate format to allow high-sample throughput, and the 
approach is gaining wide acceptance as the method of choice in the 
development of screens for receptor agonists/antagonists ( see Apell et.al. J. 

15 Biomolec. Screening 3:19-27, 1998 : Inglese et. al. Biochemistry 37:2372- 
2377, 1998). 

Binding affinity and inhibitor potency may be measured for 
candidate inhibitors using biosensor technology. 

The IGF-IR antagonists may be tested for their ability to modulate 

20 receptor activity using a cell-based assay incorporating a stably transfected, 
IGF-l-responsive reporter gene [Souriau, C., Fort, P., Roux, P., Hartley, O., 
LeFranc, M-P. and Weill, M., 1997, Nucleic Acids Res. 25, 1585-1590]. An 
IGF-l-responsive, luciferase reporter gene has been assembled and 
transfected in 293 cells. The assay addresses the ability of IGF-1 to activate 

25 the reporter gene in the presence of novel ligands. It offers a rapid (results 
within 6-8 hours of hormone exposure), high-throughput (assay can be 
conducted in a 96-'well format for automated counting) analysis using an 
extremely sensitive detection system (chemiluminescence). Once candidate 
compounds have been identified, their ability to antagonise signal 

30 transduction via the IGF-lR can be assessed using a number of routine in 
vitro cellular assays such as inhibition of IGF-l-mediated cell proliferation, 
induction of apoptosis in the presence of IGF-1 and the ablation of IGF-1- 
driven anchorage-independent cell growth in soft agar [D'Ambrosio, C., 
Ferber, A., Resnicoff, M. and Baserga, R., 1996, Cancer Res. 56, 4013-4020]. 

35 Such assays may be conducted on the P6 cell line, a cell line highly 

responsive to IGF as a result of the constitutive overexpression of the IGF-lR 
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(45-50,000 receptors/cell, [Pieti-zkowski, Z., Sell, C, Lammers, R., Ullrich, A. 
and Baserga, R.,1992, Cell GrowtJi.Diff. 3, 199-205]). Ultimately, the efficacy 
of any antagonist as a tumour therapeutic may be tested in vivo in animals 
bearing tumour isografts and xenografts as described [Resnicoff, M., Burgaud, 
5 J-L., Rotman, H. L., Abraham, D. and Baserga, R., 1995, Cancer Res. 55, 3739- 
3741; Resnicoff, M., Sell, C, Rubini, M., Coppola, D., Ambrose, D., Baserga, 
R. and Rubin, R., 1994 Cancer Res. 54: 2218-2222]. 

Tumour growth inhibition assays may be designed around a nude 
mouse xenograft model using a range of cell lines. The effects of the receptor 
10 antagonists and inhibitors may be tested on the growth of subcutaneous 
tumovirs. 

A further use of the sti'ucture of the IGF-lR fragment described here 
is in facilitating structure determination of a related protein, such as a larger 
fragment of this receptor, another member of the insulin receptor family or a 

15 member of the EOF receptor family. This new structure may be either of the 
protein alone, or in complex with its ligand. For crystallographic analysis 
this is achieved using the method of molecular replacement (Brunger, Meth. 
Enzym. 1997 276 558-580, Navaza and Saludjian, ibid, 581-594, Tong and 
Rossmann, ibid, 594-611, Bentley, ibid, 611-619) in a program such as 

20 XPLOR. In this procedure diffraction data is collected from a crystalline 

protein of unknown structure. A transform of these data (Patterson function) 
is compared with a Patterson function calculated from a known structure. 
Firstly, the one Patterson function is rotated on the other to determine the 
correct orientation of the unknown molecule in the crystal. The translation 

25 function is then calculated to determine the location of the molecule with 
respect to the ciystai axes. Once the molecule has been correctly positioned 
in the unit cell initial phases for the experimental data may be calculated. 
These phases are necessaiy for calculation of an electron density map from 
which structural differences may be observed and for refinement of the 

30 structure. Due to limitations in the method the search molecule must be 
structurally related to that which is to be determined. However it is 
sufficient for only part of the unknown structure (e.g. < 50%) to be similar to 
the search molecule. Thus the three dimensional structure of IGF-IR 
residues 1-462 may be used to solve structures consisting of related receptors, 

35 enabling a program of drug design as outlined above. 
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In summary, the general principles of receptor-based drug design can 
be applied by persons skilled in the art, using the crystallographic results 
presented above, to produce ligands of IGF-lR or other related receptors, 
having sufficient stereochemical complementarity to exhibit high affinity 
5 binding to the receptor site. 

The present invention is further described below with reference to 
the following, non-limiting examples. 

EXAMPLE 1 

10 Expression, Purification and Crystallization of the IGF-lR Fragment. 

Several factors hamper macromolecular crystallization including 
sample selection, purity, stability, solubility (McPherson, A., et aL, 1995, 
Structure 3:759-768); GilHland, G. L., & Ladner, J. E,, 1996, Curr. Opin. 
Struct. Biol. 6:595-603), and the nature and extent of glycosylation (Davis, S. 

15 J., et al., 1993, Protein Eng. 6:229-232). Initial attempts to obtain structural i 
data from soluble IGF-lR ectodomain (residues 1-906) protein, expressed in 
Lec8 cells (Stanley, P. 1989, Molec. Cellul. Biol. 9:377-383) and purified by 
affinity chromatography, produced large, well-formed crystals (1.0 rmnx 0.'2 
mm X 0.2 mm) which gave no discernible X-ray diffraction pattern 

20 (unpublished data). Similar difficulties have been encountered with ciystals 
of the structurally-related epidermal growth factor receptor (EGFR) 
ectodomain, which diffracted to only 6 A, insufficient for the determination 
of an atomic resolution structure (Weber, W. et al., 1994, J Chromat 679:181- 
189). This prompted us to search for a fragment of IGF-lR that was more 

25 amenable to X-ray crystallographic studies. 

The fragment expressed (residues 1-462) comprises the Ll-cysteine- 
rich-L2 region of the ectodomain: The selected truncation position at Val462 
is four residues downstream of the exon 6/exon 7 junction (Abbott, A. M,, et 
al., 1992, J Biol Chem. 267:10759-10763), and occurs at a position where the 

30 sequences of the IR and the structurally related EGFR families diverge 

markedly (Lax, I., et al., 1988, Molec Cell Biol. 8:1970-1978; Ward, C. W., et 
al., 1995, Proteins: Struct., Funct, Genet. 22:141-153), suggesting that it 
represents a domain boundary. The expression strategy included use of the 
pEEl4 vector (Bebbington, C. R. & Hentschel, C. C. G., 1987, In: Glover, D. 

35 M., ed. DNA Cloning. Academic Press, San Diego. Vol 3, pl63) in 

glycosidase-defective Lec8 cells (Stanley, P., 1989, Molec. Cellul. Biol. 9:377- 
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383), which produce N-Hnked oligosaccharides lacking the terminal galactose 
and N-acetylneuraminic acid residues (Davis, S. J., et al.. 1993, Protein Eng. 
6:229-232; Liu, T., et al., 1996, J Biol Chem 271:33639-33646.). The construct 
contained a C-terminal c-myc affinity tag (Hoogenboom, H. R., et al., 1991, 
Nucl Acids Res. 19:4133-4137), which facilitated immunoaffinity purification 
by specific peptide elution and avoided aggressive purification conditions. 
These procedures yielded protein which readily crystallized after a further 
gel filtration purification step. This provided a general protocol to enhance 
crystallisation prospects for labile, multidomain glycoproteins. 

The structure of this fragment is of considerable interest, since it 
contains the major determinants governing insulin and IGF-1 binding 
specificity (Gustafson, T. A. & Rutter, W. J., 1990, J. Biol. Chem. 265:18663- 
18667; Andersen, A. S., et al., 1990, Biochemistry, 29:7363-7366; 
Schumacher, R., et al., 1991, J. Biol. Chem. 266:19288-19295; Schumacher, 
R., et al., 1993, J. Biol. Chem. 268:1087-1094; Schaffer, L., et al., 1993, J. Biol. 
Chem. 268:3044-3047; Williams, P. F., et al., 1995. , J. Biol. Chem. 270:3012- 
3016), and is very similar to an IGF-lR fragment (residues 1-486) reported to 
act as a strong dominant negative for several growth functions and which 
induces apoptosis of tumour cells in vivo (D'Ambrosio, C, et al., 1996, 
Cancer Res. 56:4013-4020). 

The expression plasmid pEEl4/IGF-lR/462 was consti-ucted by inserting the 
oligonucleotide cassette: 



Aatll 

5'GACGTC GACGATGACGATAAG GAACAAAAACTCATC 
DV DD DDK EQKLI 
(EK cleavage) (c-myc tail) 

SEE D L N (Stop) 
TCAGAAGAGGATCTGAAT TAGAATTC GACGTC 3' 

EcoRI Aatll 

encoding an enterokinase cleavage site, c-myc epitope tag (Hoogenboom, H. 
R., et al., 1991, Nucleic acids Res. 19:4133-4137) and stop codon into the 
Aatll site (within codon 462) of Igf-lr cDNA in the mammalian expression 
vector pECE (Ebina, Y., et al., 1985, Cell. 40:747-758; kindly supplied by W. J. 
Rutter, UCSF, USA), and introducing the DNA comprising the 5' 1521 bp of 
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the cDNA (Ullrich, A., et aL, 1986, EMBO J. 5:2503-2512) ligated to the 
oligonucleotide cassette into the EcoRI site of the mammalian plasmid 
expression vector pEEl4 (Bebbington, C. R. & Hentschel, C. C. G., 1987, In: 
Glover, D. M., ed. DNA Cloning. Academic Press, San Diego. Vol 3, pl63; 
5 Celltech Ltd., UK). Plasmid pEE14/IGF-lR/462 was transfected into Lec8 

mutant CHO cells (Stanley, P. 1989, Molec. Cellul. Biol. 9:377-383) obtained 
from the American Tissue Culture Collection (CRL:1737), using Lipofectin 
(Gibco-BRL). Cell lines were maintained after transfection in glutamine-free 
medium (Glascow modification of Eagle's medium (GMEM; ICN Biomedicals, 

10 Australia) and 10% dialysed PCS (Sigma, Australia) containing 25 liM 

methionine sulphoximine (MSX; Sigma, Australia) as described (Bebbington, 
C. R. & Hentschel, C. C. G., 1987, In: Glover, D. M., ed, DNA Cloning. 
Academic Press, San Diego. Vol 3, pl63). Transfectants were screened for 
protein expression by Western blotting and sandwich enzyme-linked 

15 immunosorbent assay (ELISA) (Cosgrove, L., et al., 1995, ) using monoclonal - 
antibody (Mab) 9E10 (Evan et al., 1985) as the capture antibody, and either 
biotinylated anti-IGF-lR Mab 24-60 or 24-31 for detection(Soos et al,, 1992; 
gifts from Ken Siddle, University of Cambridge, UK). Large-scale cultivation 
of selected clones expressing IGF-lR/462 was carried out in a Celiigen Plus 

20 bioreactor (New Brunswick Scientific, USA) containing 70 g Fibra-Cel Disks 
(Sterilin, UK) as carriers in a 1.25 L working volume. Continuous perfusion 
culture using GMEM medium supplemented with non-essential amino acids, 
nucleosides, 25 \iM MSX and 10% PCS was maintained for 1 to 2 weeks 
followed by the more enriched DMEM/F12 without glutamine, with the same 

25 supplemention for the next 4-5 weeks. The fermentation production run was 
carried out three times under similar conditions, and resulted in an estimated 
overall yield of 50 mg of receptor protein from 430 L of harvested medium. 
Cell growth was poor during the initial stages of the fermentation when 
GMEM medium was employed, but improved dramatically following the 

30 switch to the more enriched medium. Target protein productivity was 
essentially constant during the period from —100 to 700 h of the 760 h 
fermentation, as measured by ELISA using Mab 9E10 as the capture antibody 
and biotinylated Mab 24-31 as the developing antibody. 

Soluble IGF-lR/462 protein was recovered from harvested 

35 fermentation medium by affinity chromatography on columns prepared by 
coupling Mab 9E10 to divinyl sulphone-activated agarose beads (Mini Leak; 
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Kem En Tec. Denmark) as recommended by the manufacturer. Mini-Leak 
Low and Medium affinity columns with antibody loadings of 1.5-4.5 mg/ml of 
hydrated matrix were obtained, with the loading range of 2.5-3 mg/ml giving 
optimal performance (data not shown). Mab 9E10 was produced by growing 
hybridoma cells (American Tissue Culture Collection) in serum-free medium 
in the Celligen Plus bioreactor and recovering the secreted antibody (4 g) 
using protein A glass beads (Prosep-A, Bioprocessing Limited, USA). 
Harvested culture medium containing IGF-lR/462 protein was adjusted to pH 
8.0 with Tris-HCl (Sigma), made 0.02% (w/v) in sodium azide and passed at 
3-5 ml/min over 50 ml Mab 9E10 antibody columns at 4° C. Bound protein 
was recovered by recycling a solution of 2-10 mg of the undecamer c-myc 
peptide EQKLISEEDLN (Hoogenboom et al.. 1991) in 20 ml of Tris-buffered 
saline containing O.020/0 sodium azide (TBSA). Between 650/0 and 75% of the 
product was recovered from the medium as estimated by ELISA, with a 
further 15-25% being recovered by a second pass over the columns. Peptide 
recirculation (-10 times) through the column eluted bound protein more 
efficiently than a single, slower elution. Residual bound protein was eluted 
with sodium citrate buffer at pH 3.0 into 1 M Tris HCl pH 8.0 to neutralize 
the eluant, and columns were re-equilibrated with TBSA. 

Gel filtration over Supeixlex S200 (Pharmacia. Sweden), of affinity- 
purified material showed a dominant protein peak at -63 kDa, together with 
a smaller quantity of aggregated protein (Figure 3a). The peak protein 
migrated primarily as two closely spaced bands on reduced , sodium dodecyl 
sulfate polyacrylamide gel electrophoresis (SDS-PAGE; Figure 3b), reacted 
positively in the EUSA with both Mab 24-60 and Mab 24-31, and gave a 
single sequence corresponding to the N-terminal 14 residues of IGF-IR No 
binding of IGF-1 or IGF-2 could be detected in the solid plate binding assay 
(Cosgrove et al., 1995, Protein Express Purif. 6:789-798). The IGF-lR/462 
fragment was further purified by ion-exchange chromatography on Resource 
Q (Pharmacia, Sweden). Using shallow salt gradients, protein enriched in the 
slowest migrating SDS-PAGE band was obtained (data not shown), which 
formed relatively large, well-formed ciystals (see below). Isoelectric 
focussing showed the presence of one major and two minor isoforms. Protein 
purified on Resource Q with an isocratic elution step of 0.14 M NaCl in 20 
mM TrisCl at pH 8.0 (fraction 2. Figure 4) showed less heterogeneity on 
isoelectric focussing (Figure 4 inset) and SDS-PAGE (data not shown) and 
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produced crystals of sufficient quality for structure determination (see 
below). 

Crystals were grown by the hanging drop vapour diffusion method 
using purified protein concentrated in Centricon 10 concentrators (Amicon 
5 Inc, USA) to 5-10 mg/ml in 10-20 niM Tris-HCl pH 8,0 and 0.02% (w/v) 

sodium azide, or 100 mJVI ammonium sulfate and 0.02% (w/v) sodium azide. 
Crystallization conditions were initially identified using the factorial screen 
(Jancarik, J. & Kim, S.-H.,1991, J AppI Cryst 24:409-411), and then optimised. 
Crystals were examined on an M18XHF rotating anode generator (Siemens, 

10 Germany) equipped with Franks mirrors (MSC, USA) and RAXIS IIC and IV 
image plate detectors (Rigaku, Japan). 

From the initial crystallization screen of this protein, crystals of 
about 0.1 mm in size grew in one week. Upon refining conditions, crystals of 
up to 0.6 X 0.4 X 0.4 mm could be grown from a solution of 1.7-2.0 M 

15 ammonium sulfate, 0.1 M HEPES pH 7.5. The crystals varied considerably in 
shape and diffraction quality, growing predominantly as rhombic prisms with . 
a length to width ratio of up to 5:1, but sometimes as rhombic bipyramids, 
the latter form being favoured when using material which had been eluted 
from the Mab 9E10 column at pH 3.0. Each crystal showed a minor 

20 imperfection in the form of very faint lines from the centoe to the vertices. 

Protein from dissolved ciystals did not appear to be different from the protein 
stock solution when run on an isoelectric focusing gel. Upon X-ray 
examination, the crystals diffracted to 3.0-4.0 A and were found to belong to 
the space group P2i2:i2i with a = 76.8 A, b = 99.0 A, c = 119.6 A. In the 

25 diffraction pattern, the crystal variabiUty noted above was manifest as a large 
(1-2°) and anisotropic mosaic spread, with concomitant variation in 
resolution. To improve the quality of the crystals, they were grown in the 
presence of various additives or were recrystallized. These methods failed to 
substantially improve the crystal quality although bigger crystals were 

30 obtained by recrystallization. The variability in crystal quality appeared to be 
due to protein heterogeneity, as demonstrated by the observation that more 
highly purified protein, eluted isocratically from the Resource Q column and 
showing one major band on isoelectric focusing (Figure 4 inset), produced 
ciystals of sufficient quality for structure determination. These crystals 

35 diffracted to 2.6 A resolution with cell dimensions, a = 77.0 A, b = 99.5 A, c 
= 120.1 A and mosaic spread of 0.5"*. Heavy metal derivatives of the IGF- 
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lR/462 crystals have been obtained and are leading to the determination of 
an atomic resolution structure of this fragment, which contains the Ll, 
cysteine-rich and L2 domains of human IGF-lR. 



EXAMPLE 2 

Structure of the IGF-lR/1-462 

Crystals were cryo-cooled to-170°C in a motlier liquor containing 20% 
glycerol, 2.2 M ammonium sulfate and 100 mM Tris at pH 8.0. Native and 
derivative diffraction data were recorded on Rigaku RAXIS lie or IV area 
detectors using copper Ka radiation from a Siemens rotating anode generator 
with Yale/MSC mirroroptics. The space group was P2,2,2i with a = 77.39 A, 
b = 99.72 A, and c = 120.29 A. Data were reduced using DENZO and 
SCALEPACK (Otwinowski, Z. & Minor. W., 1996. Mode.Meth. Enzym. 
276:307-326). Diffraction was notably anisotropic for all crystals examined. 

Phasing by multiple isomorphous replacement(MIR) was performed 
witli PROTEIN (Steigeman. W. Dissertation (Technical Univ. Munich, 1974) 
using anomalous scattering for both U02 and PIP derivatives. Statistics for 
data collection and phasing are given in Table 1. In the initial MIR map 
regions of protein and solvent could clearly be seen, but the path of the 
polypeptide was by no means obvious. That map was subject to solvent 
flattening and histogram matching in DM (Cowtan, K.,1994, Joint CCP4 and 
ESF-EACBM newslett. Protein Crystallogr. 31:34-38). The sti'ucture was 
traced and rebuilt using O (Jones, T. A., et al., 1991, Acta Ciystallogr. 
A47:110-119) and refined with X-PLOR 3.851 (Brunger, A. T., 1996, X-PLOR 
ReferenceManual 3.851, Yale Univ., New Haven, CT). After 5 rounds of 
rebuilding and energy minimisation the R-factor dropped to 0.279 and Rfree 
= 0.359 for data 7-2.6 A resolution. The current model contains 458 amino 
acids and 3 N-linked carbohydrates but no solvent molecules. For residues 
with B(Ca) > 70, A atomic positions are less reliable (37-42, 155-159, 305, 
336-341, 404-406,453-458). There is weak electron density for residues 459- 
461, but the c-myc tail appears completely disordered. 

The 1-462 fragment consists of the N-terminal three domains of IGF- 
lR (Ll, cys-rich, L2), and contains regions of the molecule which dictate 
ligand specificity (17-23). The molecule adopts a reasonably extended 
structure (approximately 40 x 48 x 105 A) with domain 2 (cys-rich region) 
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making contact along the length of domain 1 (Ll) but very little contact with 
the third domain (L2) (see Figure 5). This leaves a space at the centre of the 
molecule of approximately 24 A x 24 A x 24 A which is bounded on three 
sides by the three domains of the molecule. The space is of sufficient size to 
accommodate the ligand, IGF-1. 



Table 1 Summai-y of Crystallographic data 



10 



Data scl"* Rcsol. Mean R,„cigeb Completeness No. of Reunite Phasing FOMe 



(A) l/s 



(multiplicity) sites 



po\\'er 



15 



20 



Native 
PIP 

U02Ac2 



2.6 
3 0 
4.5 



18.7 
15.8 
7.5 



0064 

0.060 
0.095 



0.996 (4.1) 
0.982 (2.2) 
0.989 (2.3) 



Refinement No of refl, 
resolution (A) (free) 



7.0-2.6 



24270 
(2693) 



No. of Atoms 



3903 



1 



0.66 
0.82 



0.237 0.304 



1.71 
1.17 



Bonds^ 
(A) 

0.017 



0.47/0.71 



Angles^ 
(A) 

0.048 



^ PIP, Di-|j.-iodobis(elhylenediamine)diplalinum dinitrale; UO2AC2, Uranyl acetate. 
^ RmergG = ^h^j |Ih,j-^hl / ^h^J ^h^ where Ih,j is an intensity measurement j and Ih is the 
25 mean intensity for that reflection h. 

^ Kcullis = 1 1 FpM-Fp I - I Fiicalc I I /^h 1 I FpH I - 1 Fp | | . where FpH, Fp and FHcalc are, 

respectively, derivative, native and heavy atom structure factors for centric reflections h. 
^ Phasing power = I FHcalc l/^h^, where FHcalc is defined above and e is the lack of 

closure. 

30 ^ FOM(figure of merit) = <cos(Aah)> . where Aah is the error in the phase angle for 
reflection h. Values are given before and after density modification at 3.0 and 2.8 A 
resolution, respectively. 
^ Rcrysl and Kfree are defined in Brunger, A.T. XPLOB reference manual 3.851 (Yale Univ., 
New Haven, CT, 1996) 
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8 r.iii.s. deviation from idoal bond and anKlo-rel.ilod (1-3) distances. 
The L domains 

Each of the L domains (residues 1-150 and300-460) adopts a compact 
5 shape (24 x 32 x 37 A) consisting of a single-stranded right handed p-hehx 
and capped on the ends by short a-hehces and disulfide bonds. The body of 
the domain looks like a loaf of bread, with the base formed from a flat six- 
stranded p-sheet, 5 residues long and the sides being p-sheets three residues 
long (Figures 5 & 6). The top is irregular, but in places is similar for the two 

10 domains. The two domains are superposable with an rms deviation in Ca 

positions of 1.6 A for 109 atoms (Figure 7). Although this fold is reminiscent 
of other p-helix proteins it is much sin?pler and smaller with veiy few 
elaborations, and thus it represents a new superfamily of domains. One 
notable difference between the two domains is that the indole ring of Trp 176 

15 from the cys-rich region (Figure 6b) is inserted into the hydrophobic core of 
Ll, and the C-terminal helix is only vestigial (Figure 8). For the insulin 
receptor family the sequence motif of residues which form the Trp pocket in 
Ll does not occur in L2 (Figure 6a). However in the EGF receptor, which has 
an additional cys-rich region after the L2 domain (14, 15), the pocket motif 

20 can be found in both L domains and the Trp is conserved in both cys-rich 
regions (Figure 6b). 

The repetitive nature of the p-helix is reflected in the sequence and 
the first five turns were correctly identified by Bajaj, M., et al. (1987, 
Biochim.Biophys. Acta 916:220-226), the conserved Gly residues being found 
25 in turns making one bottom edge of the domain. However, their conclusions 
about the fold were incorrect. The"helix-like" repeat is actually a pair of 
bends at the top edge of the domain. In their Motif V, the Gly is not in a 
bend but is followed by the insertion of a conserved loop of 7-8 residues (see 
Figure 6a). Glycine is structurally important in the Gly bends as mutation of 
these residues compromises folding of the receptor [van der Vorm, E.R., et 
al., 1992, J. Biol. Chem. 267, 66-71; Wertheimer, E. et al., 1994, J. Biol. Chem. 
269, 7587-7592]. 

Comparison of the L domains with other right-handed p-helix 
structures such as pectate lyase (Yoder, M. D., et al., 1993,.Structure. 1:241- 
251-1507) and the p22 tailspike protein (Steinbacher, S., et al.. 1997. J.Mol. 
Biol. 267:865-880) shows some striking similarities as well as differences. In 
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all cases the ends of the domain are capped bya-helices, but the L domains 
also have a disulphide bond at each end to hold the termini. The other p- 
helix domains are considerably longer and have significant twist to their 
sheets, while the L domains have flat sheets. Although the sizes of the helix 
5 repeats are similar (here 24-25 residues vs 22-23 for pectate lyase) the cross- 
sections are quite different. The L domains have a rectangular cross-section, 
while pectate lyase and p22 tailspike protein are V-shaped, and have many, 
and sometimes quite large, insertions (Yoder, M. D., et al., 1993, Structure, 
1:241-251-1507; Steinbacher, S., et al., 1997, J.Mol. Biol. 267:865-880). In 

10 the hydrophobic core a common feature is the stacking of aliphatic residues 
from successive turns of the p-helix, and near the C-terminus of each L 
domain there is also a short Asn ladder, reminiscent of the long Asn ladder 
observed in pectate lyase (Yoder, M. D., et al., 1993, Structure 1:241-251- 
1507). On the opposite side of the L domains the Gly bend, as well as the 

15 two bends and sheet preceding it, have no counterpart in the other P-helix 
domains. Thus although the L domains are built on similar principles to the 
other p-helix domains they constitute a separate superfamily. 
The cys-rich domain 

The cys-rich domain is composed of eight disulfide-bonded modules (Figure 

20 6b), the first of which sits at the end of Ll, while the remainder make a 

curved rod running diagonally across Ll and reaching to L2 (Figure 5). The 
strands in modules 2-7 run roughly perpendicular to the axis of the rod in a 
manner more akin to laminin (Stetefeld, J., et al.,1996, J.MoLBioL 257:644- 
65 7 ) than to TNF receptor (Banner, D, W., et al., 1993, Cell, 73:431-445), but 

25 the modular arrangement of the cys-rich domain is different to those of other 
cys-rich proteins for which structures are known. The first 3 modules of IGF- 
IR have a common core, containing a pair of disulfide bonds, but show 
considerable variation in the loops (Figure 6b). The connectivity of these 
modules is the same as in the first half of EGF (Cys l-3and 2-4), but their 

30 structures do not appear to be closely related to any member of the EGF 
family. Modules 4 to 7 have a different motif, a p-finger, and best match 
residues 2152-2168 of fibrillin (Dowling, A. K., et al., 1996, Cell, 85:597-605). 
Each is composed of three polypeptide sti^ands, the first and third being 
disulfide bonded and the latter two forming a p-ribbon. The p-ribbon of each 

35 p- finger module lines up antiparallel to form a tightly twisted 8-sti'anded p- 
sheet (Figures 5 and 8). Module 6 deviates from the common pattern, with 
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the first segment being replaced by an a-helix followed by a large loop that is 
likely to have a role in ligand binding (see below). As module 5 is most 
similar to module 7 it is possible that the four modules arose from serial gene 
duplications. The final module is a disiilfide-linked bend of five residues. 
5 The fact that the two major types of cys-rich modules occur 

separately implies that these are the minimal building blocks of cys-rich 
domains found in many proteins. Although it can be as short as 16 residues, 
the motif of modules 4-7 is clearly distinct, and capable of forming a regular 
extended structure. Thus cys-rich domains such as these can be considered 
10 as being made of repeat units each composed of a small number of modules. 
Hormone binding 

Attempts have been made to locate the IGF-1 (and insulin) binding 
site by examining natural (Taylor, S. I., 1992, Diabetes, 41:1473-1490) and 
site-directed mutants (Williams, P. F., et al., 1995, J. Biol. Chem. 270:3012- 
15 3016; Mynarcik, D. C et al., 1996, J. Biol. Chem. 271:2439-2442: Mynarcik, D. 
C, et al., 1997, J. Biol. Chem. 272:2077-2081), chimeric receptors (Andersen, 
A. S., et al., 1990, Biochemistry 29:7363-7366; Gustafson, T. A., & Rutter, W. 
J., 1990, J. Biol. Chem. 265:18663-18667; Schaffer, L., et al.,1993, J. Biol. 
Chem. 268:3044-3047; Schumacher, R., 1993, J. Biol. Chem. 268:1087-1094: 
20 Kjeldsen, T., et al., 1991, Proc. Natl Acad. Sci. USA, 88:4404-4408) and by 
crosslinking studies (Wedekind, F., et al., 1989, Biol. Chem Hoppe-Seyler, 
370:251-258; Fabry, M., 1992, J. Biol. Chem. 267:8950-8956; Waugh, S. M., et 
al., 1989, Biochemistry, 28:3448-3458; Kurose, T., et al., 1994), .J. Biol. 
Chem.269:29190-29197-34). IGF-lR/IR chimeras not only show which 
25 regions of the receptors account for ligand specificity, but also provide an 
efficient means of identifying some parts of the hormone binding site. 
Paradoxically, regions controlling specificity are not the same for insulin and 
IGF-1. Replacing the first 68 residues of IGF-IR with those of IR confers 
insulin-binding ability on the chimeric IGF-lR (Kjeldsen, T., et al., 1991, 
30 Proc. Natl Acad. Sci. USA, 88:4404-4408), and replacing residues 198-300 in 
the cys-rich region of IR with the corresponding residues 191-290 of IGF-IR 
allows the chimeric receptor to bind IGF-1 (Schaffer, L., et al.,1993, J. Biol. 
Chem. 268:3044-3047). Thus a receptor can be constructed which binds both 
IGF-1 and insulin vdth near native affinity. From the structure it is clear that 
if the hormone bound in the central space it could contact both these regions. 
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From analysis of a series of chimeras examined by Gustafson, T. A., & 
Rutter, W. J. (J. Biol. Chem. 265:18663-18667, 1990), the specificity 
determinant in the cys-rich region can be limited fmlher to residues 223-274. 
This region corresponds to modules 4-6, and includes a large and somewhat 
5 mobile loop (residues 255-263, mean B[Ca atoms] = 57 A2) which extends 
into the central space (see Figure 5). In IR this loop is four residues bigger, 
and is stabilised by an additional disulfide bond (Schaffer, L. & Hansen, 
P.H.,1996, Exp. Clin. Endocrinol. Diabetes, 104: Suppl. 2, 89). The larger 
loop of IR may serve to exclude IGF-1 from the hormone binding site while 

10 allowing the smaller insulin molecule to bind. It is interesting to note that 
mosquito IR homologue, which has a loop two residues larger than the 
mammalian IRs, also appears to bind insulin but not IGF-1 (Graf, R., et al., 
1997, Insect Molec.Biol. 6:151-163). Analysis of the structure indicates that 
the insulin/IGF-1 specificity is controlled by residues in this loop (amino 

15 acids 253-272 in IGF-lR; amino acids 260-283 in IR) 

As chimeras only address residues which differ between the two 
receptors, a more precise analysis of the site can be obtained from single site 
mutants. In particular, from an alanine-replacement study, four regions of Ll 
important for insulin binding were identified (Williams, P. F., et aL, 1995, J. 

20 Biol. Chem. 270:3012-3016). The first three are at similar positions on 

successive turns of the (3-helix and the fourth lies on the conseived bulge on 
the large p-sheet. Thus there is a footprint for insulin binding to the Ll 
domain which lies on the first half of the large p-sheet facing into the central 
space. Residues further along the sheet which are conserved in IGF-lR could 

25 also be important. The conservative substitution of leucine for methionine at 
residue 119 of IR (113 of IGF-lR) causes a mild form of leprechaunism [Hone, 
J. et al., 1994, J. Med. Genet 31, 715-716]. This residue is buried, and the 
mutation could perturb neighbouring residues to affect insulin binding. 

The axis of the L2 domain is perpendicular to that of the Ll domain, 

30 and the N-terminal end of its p-helix is presented to the hormone-binding 
site. On this face of the L2 domain the only mutation studied so far is the 
naturally occurring IR mutant, S323L, which gives rise to Rabson-Mendehall 
syndrome and severe insulin resistance {Roach, P., 1994, Diabetes 43:1096- 
1102). As this mutant only affects insulin binding and not cell-surface 

35 expression, residue 323 of IR (residue 313 of IGF-lR) is probably at or near 
the binding site. Structurally this residue lies in the middle of a region 
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(residues 309-318 of IGF-lR) which is conserved in both IR and IGF-lR, and 
the surrounding region, 332-345 (of IGF-lR), is also quite well conserved in 
the these receptors (Figure 6a). Therefore this region is quite likely to form 
part of the hormone-binding site, but would not have been detected by 
chimeras. It is interesting to note that in this region IRR is not as well 
conseiA'ed as the other two receptors (Shier, P. & Watt, V.M.. 1989, 
J.Biol.Chem. 264:4605-14608). 

The distance from this putative hormone-binding region on L2 to that 
found on LI is about 30 A (Figure 5). Thus Ll and L2 appear too far apart to 
bind IGF-1 or insulin. However, in the crystal structure there is a deep cleft 
between part of the cys-rich domain (residue 262)and L2 (residue 305), and 
this cleft is occupied by a loop from a neighbouring molecule. Thus it seems 
probable that the position of the L2 domain in the receptor structure or the 
hormone-receptor complex adopts a different position with respect to the 
cys-rich domain than that found in the crystal. The movement required to 
bring L2 sufficiently close to Ll is small, namely a rotation of approximately 
25° about residue 298. 

A number of IR mutants have been identified which constitutively 
activate the receptor, and the majority of these are found in the a chain. 
Curiously all a chain mutants involve changes to or from proline or the 
deletion of an amino acid, implying that they cause local structural 
rearrangements. The mutation R86N is similar to wild type, but R86P 
reduces cell-surface expression and insulin binding while constitutively 
activating autophosphorylation [Gronskov, K. et al., 1993, Biochem. Biophys. 
Res. Commun. 192, 905-911]. The proline mutation probably disturbs 
residues preceding 87 which lie in the interface between the Ll and cys-rich 
domains, but it could also affect insulin binding. In the cys-rich domain 
residues 233, 281. 244 and 247 of IR are not consei-ved in IGF-lR (Figure 6b), 
yet L233P [Klinkhamer, M.P. et al., 1989, EMBO J. 8, 2503-2507], deletion of 
N281 [Debois-Mouthon, C. et al., 1996, J. Clin. Endochronol. Metab. 81, 719- 
727] or the triple mutant P243R, P244R and H247D [Rafaeloff, R. et al., 1989 
J. Biol. Chem. 264, 15900-15904] cause constitutive kinase activation. Due to 
their locations each of these three mutants appears likely to compromise the 
folding of a p-finger domain and, in turn, the stmctural integrity of the rod- 
like cys-rich domain. The stiiactural ramifications of these mutations could 
be significant for the whole receptor ectodomain, as disturbing the Ll/cys- 
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rich interface or distorting the rod-like domain could affect the relative 
position of LI and the cys-rich domain in this context. 

Ll has been fm^ther implicated, as deletion of K121 on the opposite 
side of Ll from the cys-rich domain was also found to cause 
5 autophosphorylation [Jospe, N. et aL, 1994, J. Clin. EndochronoL Metab. 79, 
1294-1302]. By contrast this mutation does not affect insulin binding. Thus a 
possible mechanism emerges for insulin binding and signal transduction. 
When insulin binds between Ll and L2 it modifies the relative position of Ll 
and the cys-rich domain in the receptor, perhaps by hinge motion between L2 

10 and the cys-rich domain like that suggested above, and the structural 

rearrangement is transmitted across the plasma membrane. In the absence of 
insulin the same signal can be initiated by mutations in the cys-rich region or 
at the Ll/cys-rich interface, but at the expense on insulin binding. The signal 
can also be initiated more directly by mutations on the opposite side of Ll 

15 which affect the interaction of Ll with other parts of the ectodomain, 
possibly the other half of the receptor dimer. 
Ligand Studies 

Although there is no structural information about an IGF-l/IGF-lR 
complex a number of studies have probed the nature of this interaction. 

20 Results from cross-linking experiments with IGF-1 and insulin and their > 
cognate receptors are consistent with the hormone binding site proposed 
above. For example B29 of insulin can be cross-linked to the cys-rich region 
(residues 205-316( (Yip, C. C, et al., 1988, Biochim. Biophys. Res. Commuii. 
15 7:321-329) or the Ll domain (Wedekind, F., et aL. 1989, Biol. Chem Hoppe- 

25 Seyler, 370:251-258). However, these two regions are reasonably well 

separated, and those studies may indicate that B29 is mobile. Other studies 
unfortunately do not map the site any more precisely. 

Analogues and site-directed mutants of IGF-1 and IGF-2 have been 
more fruitful. IGF-1 and IGF-2 contain two extra regions relative to insulin, 

30 the C region between B and A and a D peptide at the C- terminus. For IGF-1, 
replacement of the C region by a four Gly linker reduced affinity for IGF-lR 
by a factor of 40 but increased affinity for IR 5-fold (Bayne, M.L.,et aL, 1988, 
J. Biol. Chem. 264:11004-11008). Changes in affinity are consistent with the 
deletion in IGF-1 complementing differences in the cys-rich regions of IGF- 

35 IR and IR noted above. Mutation of residues either side of the C region 
(residue 24 for IGF-1 [Cascieri, M.A., et al., 1988, Biochemistry 27:3229- 
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3233], residues 27,43 for IGF-2, [Sakano, K., et al., 1991, J. Biol. Chem. 
266:20626-20635]) also has deleterious effects on the affinity of the hormone 
for IGF-IR, as has truncation of the nearby D peptide in IGF-2 (Roth, B.V., et 
al., 1991, Biochem. Biophys. Res. Commun. 181:907-914). 

hisulin has been extensively mutated. Binding studies (summarised 
in Kristensen, C. et al., 1997, J. Biol. Chem. 272, 12978-12983] indicate that 
insulin may bind its receptor via a hydrophobic patch (residues A2. A3. Al9. 
B8, Bll, B12, B15 and possibly B23 & B24). However this patch is normally 
buried, and requires the removal of the B chain's C-terminus from the 
observed position. Assuming IGF-1, IGF-2 and insulin bind their receptors in 
the same orientation, these data suggest an approximate orientation for the 
hormone when bound to the receptor. 

One notable feature of IGF-l and IGF-2 is the large number of 
charged residues and their uneven distribution over the surface. Basic 
residues are predominantly found in the C region and, in solution, this region 
is not well ordered in either IGF-1 or -2 (Sato, A., et al., 1993, Int J Peptide 
Protein Res. 41:433-440; Torres, A. M., et al., 1995,J. Mol. Biol. 248:385-401). 
In contrast the binding site of the receptor has a sizable patch of acidic 
residues in the corner where the cys-rich domain departs from Ll. Other 
acidic residues which are specific to this receptor are found along the inside 
face of the cys-rich domain and the loop (residues 255-263) extending from 
module 6. Thus it is possible that electiostatic interactions play an 
important part in IGF-1 binding, with the C region binding to the acidic patch 
of the cys-rich region near Ll and the acidic patch on the other side of the 
hormone directed towards a small patch of basic residues (residues 307-310) 
on the N- terminal end of L2. 

Although the structure of this fragment gives significant infonnation 
about the nature of the hormone binding site, residues outside this region 
have also been shown to affect binding of ligand. A number of studies have 
imphcated residues 704-715 of IR (Mynarcik, D. C et al., 1996, J. Biol. Chem. 
271, 2439-2442; Kurose, T.. et al., 1994, J. Biol. Chem.269:29190-29197). 
These residues could contact insulin on one of the sides left open in the 
cunent structure. Using insulin labelled at the Bl residue. Fabry, M., et 
al.,(1992, J. Biol. Chem. 267:8950-8956) cross-linked insuHn to the fragment 
390-488, part of which is not near the site as described. The explanation for 
this could be either the region 390-488 reaches back to the hormone binding 
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site, or this region could contact another hormone bound to the other half of 
the receptor. Further structural information is needed to establish how these 
other regions contact the hormone and to elucidate how binding of the 
hormone is communicated to the kinase inside the cell. 
5 The structure of the Ll-cys-rich-L2 domains of IGF-lR presented here 

represents the first structural information for the extracellular portion of a 
member of the insulin receptor family. The L domains display a novel fold 
which is common to the EGF receptor family, and the modular architecture 
of the cys-rich domain implies that smaller building blocks should be used to 

10 describe the composition of cysteine-rich domains. This fragment contains 
the major specificity determinants of receptors of this class for their ligands. 
It has an elongated structure with a space in the middle which could 
accommodate the ligand. The three sides of this site correspond to regions 
which have been implicated in hormone binding. Although other sites are 

15 present in the receptor ectodomain which interact with the ligand, this 

structure gives us an initial view of how the insulin, IGF-1 and IGF-2 might 
interact with their cell surface receptors to control their metabolic and 
mitogenic effects 

Such information will provide valuable insight into the structure of 
20 the corresponding domains of the IR and insulin receptor-related receptor as 
well as members of the related EGFR family (Bajaj, M., et al., 1987, Biochim 
Biophys Acta 916:220-226; Ward, C. W. etal., 1995, Proteins: Struct Funct 
Genet 22:141-153). 
EXAMPLE 3 

25 Prediction of 3D Structure of the Corresponding Domains of IRR and IR 
Based on Structure of IGF-lR Fragment. 

The sequence identities between the different members of the insulin 
receptor family are sufficient to allow accurate sequence alignments to 
facilitate 3D structure predictions by homology modelling. The alignments of 
30 the ectodomains of human IGF-lR, IR, and IRR are shown in Figure 9. 
EXAMPLE 4 

Single-Molecule Imaging of Human Insulin Receptor Ectodomain and its 
Fab Complexes 

Cloning and expression of hIR -11 ectodomain protein 

35 A full length clone of the human IR exon -11 form (hIR -11) was 

prepared by exchanging an Aat II fragment, nucleotides 1195 to 2987 , of the 
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exon +11 clone (plasmid pET; Ellis et al., 1986; gift from Dr W. J. Rutter, 
UCSF) of hIR (Ebina et al., 1985, Cell 40, 747-758) with the equivalent Aat II 
fragment from a plasmid (pHIR/Pl2-l, ATCC 57493) encoding part of the 
extracellular domain and the entire cytoplasmic domain of hIR -11 (Ullrich 
et al., 1985, Nature 313 , 756-761). The ectodomain fragment of hIR -11 
(2901 bp, coding for the 27 residue signal sequence and residues Hisl- 
Asn914) was produced by Sail and Sspl digestion and inserted into the 
mammalian expression vector pEE6.HCMV-GS (Celltech Limited, Slough, 
Berkshire, UK) into which a stop codon linker had been inserted, as 
described previously (Cosgrove et al., 1995 , Protein Expression and 
Purification 6, 789-798) for the hIR exon +11 ectodomain. 

The resulting recombinant plasmid pHIR II (2 ng) was transfected 
into glycosylation-deficient Chinese hamster ovary (Lec 8) cells (Stanley, 
1989, Molec. Cellul. Biol. 9, 377-383) with Lipofectin (Gibco-BRL). After 
transfection, the cells were maintained in glutamine-free medium GMEM 
(ICN Biomedicals, Australia) as described previously (Bebbington & 
Hentschel, 1987, In DNA Cloning (Glover, D., ectodomain.). Vol m. Academic 
Press, san Diego; Cosgrove et al., 1995, Protein Expression and Purification 6, 
789-798). Expressing cell lines were selected for growth in GMEM with 25 
laM methionine sulphoximine (MSX, Sigma). Transfectants were screened for 
protein expression using sandwich ELISA with anti-IR monoclonal antibodies 
83-7 and 83-14. Metabolic labelling of cells, immunoprecipitations, insulin 
binding assays and Scatchard analyses were performed as described 
previously for the exon +11 form of hIR ectodomain (Cosgrove et al., 1995, , 
Protein Expression and Purification 6, 789-798). 
hIR -11 ectodomain production and purification 

The selected clone (inoculum of 1.28 x 108 cells) was grown in a 
spinner flask packed with 10 g of Fibra-cel disc carriers (Sterilin, U.K.) in 500 
ml of GMEM medium containing 10% fetal calf serum (PCS) and 25 |.iM MSX. 
Selection pressure was maintained for the duration of the culture. 

Ectodomain was recovered from harvested medium by affinity 
chromatography on immobilized insulin, and further purified by gel filtration 
chromatography on Superdex S200 (Pharmacia; 1 x 40 cm) in Tris-buffered 
saline containing 0.02% sodium azide (TBSA) as described previously 
(Cosgrove et al.. 1995, Protein Expression and Purification 6, 789-798). 
Solutions of purified hIR -11 ectodomain were stored at 4° C prior to use. 
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5 Production of Fab fragments and their complexes with ectodomain 

Purification of Mabs 83-7, 83-14 and 18-44 from ascites fluid by 
affinity chromatography using Protein A-Sepharose, and the production of 
Fabs, were based on the methodologies described in Coligan et al.,1993, 
Current Protocols in Immunology, Vol 1, pp 2.7.1-2.8.9, Greene Publishing 

10 Associates & Wiley - Interscience, John Wiley and Sons. Fab was produced 
from monoclonal antibody by mercuripapain digestion for 1-4 h, followed by 
gel filtration on Superdex S200. Products were monitored by reducing and 
non-reducing SDS-PAGE, For 83-7 Mab, an IgG Type 1 monoclonal antibody, 
the bivalent (Fab) 2' isolated by this method was reduced to monovalent Fab 

15 83-7 by mild reduction with mM L-cysteine.HCl in 100 mM Tris pH 8.0 

(Coligan et al., 1993, Current Protocols in Immunology, Vol 1, pp 2.7.1-2.8.9, 
Greene Publishing Associates & Wiley - Interscience, John Wiley and Sons). 

Complexes of Fab with hIR -11 ectodomain were produced by mixing 
— 2.5 to 3.5 molar excess of Fab with hIR -11 ectodomain at ambient 

20 temperature in TBSA at pH 8.0. After 1-3 h, the complex was separated from 
unbound Fab by gel filtration over a Superdex S200 column in the same 
buffer. 

Electron microscopy 

Uncomplexed hIR -11 ectodomain and the Fab complexes described 
25 above were diluted in phosphate-buffered saline (PBS) to concentrations of 
the order of 0.01-0.03 mg/ml. Prior to dilution, 10% glutaraldehyde (Fluka) 
was added to the PBS to achieve a final concentration of 1% glutaraldehyde. 
Droplets of — 3ml of this solution were applied to thin carbon film on 700- 
mesh gold grids after glow-discharging in nitrogen for 30 s. After 1 min. the 
30 excess protein solution was drawn off and followed by application and 

withdrawal of 4-5 droplets of negative stain [2% uranyl acetate (Agar), 2% 
uranyl formate ( K and K), 2% potassium phosphotungstate (Probing and 
Structure) adjusted to pH 6.0 with KOH, or 2% methylamine tungstate (Agar) 
adjusted to pH 6.8 with NH40H]. In the case of both uranyl acetate and 
35 uranyl formate staining, an intermediate wash with 2 or 3 droplets of PBS 
was included prior to application of the stain. The grids were air-dried and 
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then examined at 60kV accelerating voltage in a JEOL lOOB transmission 
electron microscope at a magnification of lOO.OOOx. It was found that there 
was a typical thickness of negative stain in which Fabs were most easily 
seen. Hence areas for photography had to be chosen from particular zones of 
the grid. Electron micrographs were recorded on Kodak SO-163 film and 
developed in undiluted Kodak Dl9 developer. The election-optical 
magnification was calibrated under identical imaging conditions by recording 
single-molecule images of the antigen-antibody complex of influenza virus 
neuraminidase heads and NClO MFab (Tulloch et al., 1986, /.Mo7. Biol. 190, 
215-225; Malby et al., 1994, Structure, 2, 733-746). 
Image processing 

Electron micrographs showing particles in a limited number of 
identifiable projections were chosen for digitisation. Micrographs were 
digitised on a Perkin-Elmer model 1010 GMS PDS flatbed scanning 
microdensitometer with a scanning aperture (square) size of 20 mm and 
stepping increment of 20 mm corresponding to a distance of 0.2 nm on the 
specimen. Particles were selected from the digitised micrograph using the 
interactive windowing facility of the SPIDER image processing system (Frank 
et al., 1996, /. Struct. Biol. 116, 190-199). Particles were scaled to an optical 
density range of 0.0 - 2.0 and aligned by the PSPC reference-free alignment 
algorithm (Marco et al., 1996, Ultramicroscopy, 66, 5-10). Averages were then 
calculated over a subset of correctly aligned particles chosen interactively as 
being representative of a single view of the particle. The final average image 
presented here is derived from a library of 94 images. 
Biochemical characterization of expressed hIR -11 ectodomain 

The recombinant protein examined corresponded to the the first 914 
residues of the 917 residue ectodomain of the exon -11 form of the human 
insuhn receptor (Ullrich et al.. 1986, Nature 313 , 756-761). Expressed protein 
was shown, by SDS-PAGE and autoradiography of immunoprecipitated 
product from metabolically labelled cells, to exist as a homodimeric complex 
of -270 - 320 kDa apparent mass, which dissociated under reducing 
conditions into monomeric a and p' subunits of respective apparent mass 
— 120 kDa and -35 kDa (data not shown). 

Purified hIR -11 ectodomain, expressed in Lec8 cells and purified by 
affinity chromatography on an insulin affinity column, eluted as a 
symmetrical peak on a Superdex S200 gel filtration column (Figure 10). The 
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protein eluted with an apparent mass of --400 kDa, calculated from a 
standard curve generated by the elution positions of standard proteins (not 
shown). As expected for protein expressed in Lec 8 cells, whose 
glycosylation defect produces truncated oligosaccharides (Stanley, 1989, . 
5 Molec. CelhiL Biol 9, 377-383), this value is less than the apparent mass (450 

- 500 kDa) reported for hIR +11 ectodomain expressed in wild-type CHO-Kl 
cells (Johnson et al., 1988, Proc, Natl Acad. Sci USA 85, 7516-7520; Cosgrove 
et al., 1925, Protein Expression and Purification 6, 789-798). 

Radioassay of insulin binding to piuified ectodomain gave linear 

10 Scatchard plots and Kd values of 1.5 - 1.8 x 10-9 M, similar to the values of 
2.4 - 5.0 X 10-9 M reported for the hIR -11 ectodomain (Andersen et al., 1990, 
Biocljemistr}^ 29, 7363-7366; Markussen et al., 1991, /. BioL Chem. 266, 
18814-18818; Schaffer, 1994, Eur. J. Biochein, 221, 1127-1132) and the values 
of —1.0 - 5.0 X 10-9 M reported for the hIR +11 ectodomain (Schaefer et al., 

15 1992, /. BioL Chem. 267, 23393-23402; Whittaker et al., 1994, Molec, 

EndocrinoL 8, 1521-1527; Cosgrove et al., 1995, Protein Expression and - 
Purification 6, 789-798). 
Expression of hIGF-lR ectodomain 

Cloning, expression and purification of this protein used elements 

20 common to those described for hIR -11 ectodomain (Cosgrove et al., 1995, 
Protein Expression and Purification 6, 789-798), and resulted in purified 
product that was recognised by receptor-specific Mabs 17-69, 24-31 and 24-60 
(Soos et al., 1992,/. BioL Chem. 267, 12955-63) and was composed of a and 
P' subunits of mass similar to those of hIR ectodomain. 

25 Preparation of hIR -11 ectodomain/MFab complexes 

A complex of hIR -11 ectodomain and Fab from antibody 83-14 eluted 
as a symmetrical peak of 460 -500 kDa (Figure 10), as did complexes 
generated from a mixture of hIR -11 ectodomain with Fab from antibody 18- 
44 and a mixture of hIR -11 ectodomain with Fab 83-7 (not shown). A co- 

30 complex of ectodomain with Fabs from antibodies 18-44 and 83-14 eluted at 

- 620 kDa, as did a co-complex with PvIFabs 83-14/83-7 and another with 
IvIFabs 83-7/18-44 (not shown). A complex of hIR -11 ectodomain with all 
three MFab derivatives, 18-44, 83-7 and 83-14, eluted at an apparent mass of 

- 710 kDa (Figure 10). 
35 Electron microscopy 

Imaging of hIR -11 and hIGF-lR ectodomains 
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Single-molecule imaging of uncomplexed dimeric hIR -11 
ectodomain was carried out under a variety of negative staining conditions, 
vi^hich emphasised different aspects of the structure of the molecular 
envelope. Images obtained by this investigation are depicted in Figure 11. 
5 The least aggressive or penetrative stain was potassium 

phosphotungstate (KPT) , which revealed consistent globular particles with 
veiy little internal structure other tlian a suggestion of a division into two 
parallel bars. Staining with methjdamine tungstate also revealed the parallel 
bar images. 

10 Further investigation using progressively more penetrative, but also 

potentially more disruptive, stains confirmed the observations above. 
Staining with uranyl acetate and uranyl formate showed the separation of the 
parallel bars most clearly, but uranyl acetate showed evidence of disrupting 
the structure of the particles, i.e. a decrease in the consistency of the particle 
15 shape and a tendency for particles to look unravelled or denatured despite 
having been subjected to chemical cross-linking prior to staining. In areas of 
thicker stain, parallel bars predominated, whereas in more thinly stained 
regions, U-shaped particles could be identified, sometimes outnumbering the 
parallel-bar sti^uctures (see Figure 11). 
20 Imaging of hIR -11 ectodomain complexed with 83-7 MFab 

This complex was particularly noteworthy for the consistency of the 
form of the particles, especially under the gentler staining conditions 
afforded by stains such as KPT and methylamine tungstate. The particles 
were interpreted as having been restricted in the views they presented, after 
25 air-drying on the carbon support film, by the almost diametrically opposite 
binding of the two Fab arms to the antigen to form a highly elongated 
complex structure. Under these conditions three distinct views could be 
recognised (see Figure 11). Two views (interpreted as top-down/bottom-up) 
show the Fab arms displaced clockwise or anti-clockwise as extensions of the 
30 parallel plates with two-fold symmetiy. The third view shows an image with 
the two Fab arms in line roughly through the centre of the receptor on its 
opposite sides, interpreted as a side projection of binding half-way up the 
plates. 

The use of aggressive uranyl stains operating at lower pHs revealed 
internal structure of the molecular envelope at the expense of consistency of 
the particle morphology. For example, staining with uranyl acetate or uranyl 
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formate showed that parallel bars can be seen in particles in which the Fab 
arms are displaced either clockwise or anticlockwise but not where the 
intermediate central or axial position of the two Fab arms is presented in 
projection. These observations show 83-7 MFab binding roughly half-way up 
5 the side-edge of each hIR -11 ectodomain plate. The epitope recognised by 
Mab 83-7 has been mapped to the cys-rich region, residues 191-297, by 
analysis of chimeric receptors (Zhang and Roth, 1991, Proc. NatL Acad, Sci. 
USA 88, 9858-9862). 

Imaging of hIR -11 ectodomain complexed with either 83-14 MFab or 18-44 
10 MFab 

Complexes were formed with Fabs from the most insulin-mimetic 
antibody Mab 83-14. Projections showing the Fab arms bound to and 
extending out from near the base of the U-shaped particles were identified. 
A second field of particles showed objects composed of two parallel bars as 

15 obsei^ved for the undecorated ectodomain, with Fab arms projecting obliquely 
from diametrically opposite extremities (see Figure 11). Similar but less 
definitive images were also seen when MFab 18-44 was bound to hIR -11 
ectodomain. The epitope for Mab 83-14 is between residues 469-592 (Prigent 
et al., 1990) in the connecting domain. This domain contains one of the 

20 disulphide bonds (Cys524-Cys524) between the two monomers in the IR 

dimer (Schaffer and Ljungqvist, 1992, Biochem. Biophys. Res, Commun. 189, 
650-653). The epitope for Mab 18-44 is a linear epitope, residues 765-770 
(Prigent et al., 1990, . /. Biol. Chem, 265, 9970-9977) in the P-chain, near the 
end of the insert domain (O'Biyan et al., 1991, Mol Cell Biol 11, 5016- 

25 5031). The insert domain contains the second disulphide bond connecting 
the two monomers in the IR dimer (Sparrow et al., 1997,/. Biol. Chem,, 272, 
29460-29467). 

Imaging of hIR -11 ectodomain co-complexed with two different MFabs per 
monomer 

30 The double complex of hIR -11 ectodomain with MFabs 83-7 and 18- 

44 was stained with 2% KPT at pH 6.0, and revealed the molecular 
envelopes. The particle appears complex in shape, and can assume a number 
of different orientations on the carbon support film, giving rise to a number 
of different projections in the micrograph. The predominant view is of an 

35 asymmetric X-shape (some examples circled). It shows the 83-7 MFab arms 
bound at opposite ends of the parallel bars with the two 18-44 MFabs 
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appearing as shorter projections extending out from either side of each 
ectodomain. 

Images of the double complex of hIR -11 ectodomain with 83-7 and 
83-14 MFabs gave X-shaped images similar to those seen with the 83-7/18-44 
double complex. In contrast the double complex of hIR -li ectodomain with 
18-44 and 83-14 MFabs did not present the characteristic asymmetric X- 
shapes described above. Instead, the molecular envelope appeared to be 
elongated in many views, with only an occasional X-shaped projection. 
While a detailed interpretation of these images would be premature, it is 
clear that MFabs 18-44 and 83-14, two of the more potent insulin mimetic 
antibodies (Prigent et al., 1990,/. Biol. Chem. 265, 9970-9977), can bind 
simultaneously to the receptor. 

Imaging of hIR -11 ectodomam co-complexed with three different MFabs 
per monomer 

A field of particles from a micrograph of hIR -li ectodomain were 
complexed simultaneously with MFabs 83-7, 83-14 and 18-44. In the thicker 
stain regions the molecular envelope was X-shaped. and looked very similar 
to that of the double complexes of hIR -11 ectodomain with either 83-7 and 
18-44 or 83-7 and 83-14. However, in the more thinly stained regions 
particles of greater complexity were visible, and it was possible occasionally 
to identify that there are in fact more than four MFabs bound to the 
ectodomain dimer. 

The single-molecule imaging of hIR -11 ectodomain presented here 
suggests a molecular envelope for this dimeric species significantly different 
from that of any previously published study. However, an unequivocal 
determination of the molecular envelope even from the present study is not 
entirely straightfoi-ward. A major complicating factor here has been the 
relative fragility of the expressed ectodomain when exposed to the rigors of 
electi-on microscope preparation by negative staining. For example, staining 
with potassium phosphotungstate ( KPT, pH 6.0-7.0) frequently suggested a 
denaturation of the dimeric molecules, but when appropriate conditions were 
satisfied, good seemingly interpretable molecular envelope images were 
achieved; staining with methylamine tungstate ( pH -7.0) supported the best 
KPT molecular envelope images, but had the suggestion of a swelling of the 
molecular structure at neutral pH; and the acid-pH stains of uranyl acetate ( 
pH -4.2) and uranyl formate ( pH~3.0). with their ability to penetrate the 
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ectodomain structure, appeared to illuminate not so much the molecular 
envelope as the zones of high projected protein density within the dimer. 

An amalgam of impressions from these various staining regimens has 
led to the following interpretation of single-molecule images of these 
5 undecorated, or naked, dimers: the predominant dimeric molecular image 
encountered here has been that of "parallel bars"of projected protein density. 
This view is so pi^edominant, indeed, that it suggests there is either a single 
preferred orientation of the molecules on the glow-discharged carbon support 
film, or that this impression of parallel bars of density may represent a 

10 mixture of superficially similar structure projections, with the subtleties of 

these different projections being masked by the relatively coarse resolution of 
this single-molecule direct imaging. The impression of parallel bars of 
projected protein density is particularly predominant in regions of thicker 
negative stain. A second view of the molecular envelope, appreciably less 

15 well represented in regions of thicker stain but predominant in regions of 
thin staining, is that of 'open' U's, or Vs. These two views of hIR -11 
ectodomain were supported by the single-molecule imaging of hIGF-lR , 
ectodomain under comparable conditions of negative staining. 

If the assumption is made that these two recognisable projected 

20 views, that of parallel bars and of open U'sA^'s, are different views of the 
same dimeric molecule, an assumption strongly supported by the MFab 
complex imaging, a coarse model of the molecular envelope can be 
rationalized. The model structure is roughly that of a cube, composed of two 
almost-parallel plates of high protein density, separated by a deep cleft of low 

25 protein main-chain and side-chain density able to be penetrated by stain, 

and connected by intermediate stain-excluding density near what is assumed 
here to be their base ( that is, nearest the membrane-anchoring region). The 
width of the low-density cleft appears to be of the order of 30-35A, sufficient 
to accommodate the binding of the insulin molecule of diameter ca. 3oA, 

30 although we have no electron microscopical evidence to support insulin- 
binding in this cleft at this stage. 

It has been established through imaging of bound 83-7 MFab that 
there is a dimeric two-fold eixis normal to the membrane surface between 
these plates of density. Occasionally, dimer images display a relative 

35 displacement of the bars of density, interpreted here as a limited capacity for 
a shearing of the interconnecting zone between the two plates along their 
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horizontal axis parallel to the membrane; other images show bars skewed 
from parallel, implying a limited capacity for the plates to rotate 
independently around the two-fold axis, again via this interconnecting zone. 
These two observations each suggest a relatively flexible connectivity 
between the dimer plates in the membrane-proximal region of intermediate 
protein density, which could possibly contribute to the transmembrane 
signalling process. 

The approximate overall measured dimensions of the ectodomain 
dimer are 110 x 90 x 120A, calibrated against the dimensions of imaged 
influenza neuraminidase heads, known from the solved X-ray structure 
(Varghese et al., 1983, Nature 303, 35-40). It can be noted that there is a 
compatibility here between the molecular weights and molecular dimensions 
of these two molecular species: the compact tetrameric influenza 
neuraminidase heads of Mr -200 kDa occupy a volume almost 100 x 100 x 
60 A; the more open dimeric insulin receptor ectodomains of similar Mr 
—240 kDa imaged here occupy a volume approximately 110 x 90 x 120 A , 
roughly twice that of the neuraminidase heads, accommodating the slightly 
higher molecular weight and substantial central low-density cleft. 

The low-resolution roughly cubic compact stiucture proposed here 
differs substantially from the T-shaped model proposed by Christiansen et al. 
(1991, Proc. Natl. Acad. Sci. U. S. A. 88, 249-252) and Tranum-Jensen et al.. 
(1994. /. Membrane Biol. 140, 215-223) for the whole receptor and the 
elongated model proposed by Schaefer et al. (1992,/. Biol. Chem. 267, 23393- 
23402) for soluble ectodomain. Significantly, those previous studies did not 
provide any convincing independent electron microscopical evidence that 
their imaged objects were in fact insulin receptor. 

In the present study, the identity of the imaged molecules as hIR -11 
ectodomain has been confirmed by imaging complexes of the dimer with 
Fabs of the three well-established conformational Mabs against native hIR, 
83-7, 83-14 and 18-44 (Soos et al.,1986, Biochem. J. 235, 199-208; 1989, Proc. 
Natl Acad. Sci. USA 86, 5217-5221), bound singly and in combination. In all 
these instances, virtually eveiy particle in the field of view exhibited MFab 
decoration through binding to conformational epitopes, establishing not only 
the identity of the imaged particles but also the conformational integrity of 
the expressed ectodomains. Furthermore, the cleanliness and uniformity of 
these hIR -11 ectodomain preparations, both naked and decorated, visualised 
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here by electron microscopy demonstrate their high suitability for X-ray 
crystallization trials. 

The known flexibility of the Fab arms exacerbates image-to-image 
variability beyond the limited extent already described for the undecorated 
5 dimeric ectodomains, complicating any precise interpretation of these 

antigen-antibody complexes. Such molecular flexibility also renders largely 
impractical any single-molecule computer image averaging to facilitate image 
interpretation, progressively more so with the higher order antigen-antibody 
complexes studied here. 

10 The most readily interpretable of these images, showing least image- 

to-image variability, are those of 83-7 MFab bound to dimers where, 
fortuitously, the antigen-antibody complex is constrained in its degrees of 
rotational freedom on the carbon support film. Many projected images show 
the two Fab arms in line roughly through the centre of the antigen on its 

15 opposite sides, interpreted as a side projection of binding half-way up the 
plates from their membrane-proximal base. Other sub-sets of images show 
the two Fab arms still parallel but displaced clockwise or anticlockwise with 
2-fold symmetry, each Fab approximating an extension of one of the parallel 
bars of antigen density, interpreted here as representing top or bottom 

20 projections along the 2-fold axis. The third projection, along the axis of the 
Fab arms, could not be sampled here because of the constraining geometry of 
this molecular complex. These observations suggest binding of 83-7 MFab 
roughly half-way up the side-edge of the hIR -11 ectodomain plate. This then 
allows an initial attempt at spatially mapping the 83-7 MFab epitope, which 

25 has been sequence-mapped to residues 191-297 in the cys-rich region of the 
insulin receptor (Zhang and Roth, 1991, Proc, NatL Acad. Sci. USA 88, 9858- 
9862). The spatial separation and relative orientations of the two binding 
epitopes of Mab 83-7 on the hIR -11 ectodomain dimer as indicated here 
appear inconsistent with the proposal that Mab 83-7 could bind 

30 intramolecularly to hIR (O'Brien et ah, 1987, Biochem /. 6, 4003-4010). 

Decoration of the ectodomain dimer with 83-7 MFab established that 
the two plates of high protein-density are arranged with 2-fold symmetry. 
Decoration with either 83-14 or 18-44 MFab , on the other hand, allowed 
sampling of the third projection of the ectodomain dimer precluded by 83-7 

35 MFab binding. Significantly, this third view established unequivocally the U- 
shaped projection of the hIR -11 ectodomain dimer, something which was 
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only able to be assumed with the undecorated ectodomain images. Further, 
this projection has allowed a rough spatial mapping close to the base of the 
U-shaped dimer for the epitopes recognised by 83-14 MFab (residues 469-592, 
connecting domain) and 18-44 MFab (residues 765-770, b-chain insert 
domain; exon 11 plus numbering, Prigent et al., 1990,/. Biol. Chem. 265, 
9970-9977). 

Inherent in the model structure is tlie implication that, with the two- 
fold axis aligned normal to the membrane surface, the mouth of the low- 
density cleft where insulin binding may occur would lie most distant from 
the transmembrane anchor, whilst the zone of intermediate density 
connecting the two high-density plates would be in close proximity to the 
membrane. It follows, in this model, that the Ll/cys-rich/L2 domains(Bajaj et 
al., 1997, Biochim. Biophys. Acta 91G, 220-226; Ward et al.,1995, Proteins: 
Struct., Funct. Genet. 22, 141-153), which comprise much of the insulin- 
binding region (see Mynarcik et al., 1997, . /. Biol. Chem. 272, 2077-2081), 
most probably lie in the membrane-distal upper halves of the two plates, 
whilst the membrane-proximal lower halves contain the connecting domains, 
the fibronectin-type domains, the insert domains and the interchain 
disulphide bonds (Schaffer and Ljungqvist, 1992, Biochem. Biophys. Res. 
Commim. 189, 650-653; Sparrow et al., 1997,/. Biol. Chem., 272, 29460- 
29467). Such a disposition of domains is supported by tlie images seen with 
the single MFab decoration, the 83-7 MFab epitope in the cys-rich region 
being spatially mapped roughly half-way up the side-edge of the ectodomain 
plates, and the 83-14 and 18-44 MFab epitopes (connecting domain andp- 
chain insert domain, respectively) being mapped near the base of the plates. 
Our preference is for a single a-b(t monomer to occupy a single plate, 
although the possibility of a single monomer straddling the two plates of 
protein density cannot be discounted. 

The more complex images involving co-binding of two, and even 
more so of all three. MFabs to each monomer of the ectodomain dimer are 
not easily interpretable with respect to relative domain arrangements within 
the monomer at present, not least of all because of the difficultv of finding 
conditions of negative staining that will simultaneously maintain tiie 
integrity of the Fab binding while highlighting recognisable and 
reproducible details of the internal structure of the dimeric IR ectodomain 
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The data presented here demonstrate the abiHty of single-molecule 
imaging to give an initial insight into the topology of multidomain structures 
such as the ectodomain of hIR, and the value of combining this technique 
with that of either single or multiple monoclonal Fab attachment per 
5 monomer as a potential means of epitope, and domain, mapping of the 

structure. By imaging Fab complexes of other members of the family, such as 
hIGF-lR ectodomain, and combining available sequence-mapped epitope 
infonnation with that presented here, a more comprehensive understanding 
of domain arrangements within the IR family ectodomains should be 
10 forthcoming. 
EXAMPLE 5 

Structure-Based Design of Ligands for the IGF Receptor as Potential 
Inhibitors of IGF Binding 

The structure of IGF receptor can be considered as a filter or screen 

15 to design, or evaluate, potential ligands for the receptor. Those skilled in the 
art can use a number of well known methods for de novo ligand design, such 
as GRID, GREEN, HSITE, MCSS, HINT, BUCKETS, CUX, LUDI, CAVEAT, 
SPLICE, HOOK, NEWLEAD, PRO LIGAND, ELANA, LEGEND, GenStar, 
GrowMol, GROW, GEMINI, GroupBuild, SPROUT, and LEAPFROG, to 

20 generate potential agonists or antagonists for IGF-lR. In addition, the IGF-lR 
structure may be used as a query for database searches for potential ligands. 
The databases searched may be existing eg ACD, Cambridge 
Crystallographic, NCI, or virtual. Virtual databases, which contain veiy large 
numbers (currently up to 10^^) of chemically reasonable structures, may be 

25 generated by those skilled in the art using techniques such as DBMaker, 
ChemSpace, TRIAD and ILIAD. 

The IGFR structure contains a number of sites into which putative 
ligands may bind. Search strategies known to those skilled in the art may be 
used to identify putative ligands for these sites. Examples of two suitable 

30 search strategies are described below: 

(i) Database Search 

The properties of key parts of the putative site may be used as a database 
search query. For example, the Unitj'^ 2.x database software may be used. A 
flexible 3D search can be run in which a "directed tweak" algorithm is used to 
35 find low energy conformations of potential ligands which satisfy the queiy. 

(ii) Dg novo design of ligands 
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The Leapfrog algorithm as incorporated in the software package, Sybyl 
version 6.4.2 (Tripos Associates, St Louis), may be used to design potential 
ligands for IGF-lR sites. The coordinates of residues around the site may be 
taken from the x-ray structure, hydrogens and charges (Kollman all atom 
dictionary charges) added. From the size, shape and properties of the site a 
number of potential ligands may be proposed. Leapfrog may be used to optimize 
the conformation of hgands and position on the site, to rank the likely strength 
of binding interactions with IGF-IR. and to suggest modifications to the 
structures which would have enhanced binding. 

It is also possible to design ligands capable of interacting with more 
than one site. One way in which this may be done is by attaching flexible 
linkers to ligands designed for specific sites so as to join them. The linkers 
may be attached in such a way that they do not disrupt the binding to 
individual sites. 



reference. 



All references cited above are incorporated herein in their entirety by 



It will be appreciated by persons skilled in the art that numerous 
variations and/or modifications may be made to the invention as shown in 
the specific embodiments without departing from the spirit or scope of the 
invention as broadly described. The present embodiments are, therefore to 
be considered in all respects as illustrative and not restrictive 
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Claims: 

1. A method of designing a compound able to bind to a molecule of the 

insulin receptor family and to modulate an activity mediated by the molecule, 
5 including the step of assessing the stereochemical complementarity between the 
compound and the receptor site of the molecule, wherein the receptor site 
includes: 

(a) amino acids 1 to 462 of the receptor for IGF-1, having the atomic 
coordinates substantially as shown in Figui^e 1; 
10 (b) a subset of said amino acids, or; 

(c] amino acids present in the amino acid sequence of a member of the 
insulin receptor family, which form an equivalent three-dimensional structure to 
that of the receptor molecule as depicted in Figure 1. 

15 2. A method according to claim 1, in which the compound is selected or 

modified from a known compound identified from a database. 

3. A method according to claim 1, in which the compound is designed so 
as to complement the structure of the receptor molecule as depicted in Figure 1. 

20 

4. A metliod according to any one of claims 1 to 3, in which the compound 
has structural regions able to make close contact with amino acid residues at the 
surface of the receptor site lining the groove, as depicted in Figure 2. 

25 5. A method according to any one of claims 1 to 4, in which the compound 

has a stereochemistrj^ such that it can interact with both the Ll and L2 domains 
of the receptor site. 

6. A metliod according to any one of claims 1 to 4, in which the compound 
30 has a stereochemistiy such that it can interact with the Ll domain of a first 

monomer of the receptor homodimer, and with the L2 domain of the other 
monomer of the receptor homodimer. 

7. A method according to an}^ one of claims 1 to 4, in which the interaction 
35 of the compound with the receptor site alters the position of at least one of the 
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LI, L2 or cysteine-rich domains of the receptor molecule relative to the position 
of at least one of the other of said domains. 

8. A method according to claim 7, in which the compound interacts with 
Uie p sheet of the Ll domain of the receptor molecule, thereby causing an 
alteration in the position of the Ll domain relative to the position of the 
cysteine-rich domain or of the L2 domain. 

9. A method according to claim 7, in which the compound interacts with 
the receptor site in the region of the interface between the Ll domain an the 
cysteine-rich domain of the receptor molecule, thereby causing the Ll domain 
and the cysteine-rich domain to move away from each other. 

10. A method according to claim 7, in which the compound interacts with 
the hmge region between the L2 domain and the cysteine-rich domain of the 
receptor molecule, thereby causing an alteration in the positions of the L2 
domain and the cysteine-rich domain relative to each other. 

11. A method according to any one of claims 1 to 10, in which the 
stereochemical complementarity between the compound and the receptor site is 
such that the compound has a K,, for the receptor side of less than lO^M 



12. 



A method according to claim 11, in which the K,, is less tlian 10"M. 



25 13. A method according to any one of claims 1 to 12, in which the 

compound has the ability to increase an activity mediated by the receptor 
molecule. 



14. A method according to any one of claims 1 to 12, in which the 
compound has the ability to decrease an activity mediated by the receptor 
molecule. 

15. A method according to claim 14, in which the stereochemical 
interaction between the compound and tl.e receptor site is adapted to prevent 
the banding of a natural ligand of the receptor molecule to the receptor site 
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16. A method according lo claim 14 or claim 15, in which the compound has 
aKi of less than 10"°M. 

17. A method according to claim 16, in which the compound has a Ki of less 
5 than 10 ^'M. 

18. A method according to claim 17, in which the compound has a of less 
than lO'^'M. 

10 19. A method according to any one of claims 1 to 18, in which the receptor 
is the IGF-IR. 

20. A method according to any one of claims 1 to 18, in which the receptor 
is the insulin receptor. 

15 

21. A computer-assisted method for identifying potential compounds able to 
bind to a molecule of the insulin receptor family and to modulate an activity 
mediated by the molecule, using a programmed computer including a processor, 
an input device, and an output device, including the steps of: 

20 (a) inputting into the programmed computer, through the input 

device, data comprising the atomic coordinates of the IGF-lR molecule as shown 
in Figure 1, or a subset thereof; 

(b) generating, using computer methods, a set of atomic coordinates of 
a structure that possesses stereochemical complementarity to the atomic 

25 coordinates of the IGF-lR site as shown in Figure 1, or a subset thereof, thereby 
generating a criteria data set; 

(c) comparing, using the processor, the criteria data set to a computer 
database of chemical structures; 

(d) selecting from the database, using computer metliods, chemical 

30 structures which are stiucturally similar to a portion of said criteria data set; and 

(e) outputting, to tlie output device, the selected chemical structures 
which are similar to a portion of the criteria data set. 

22. A computer-assisted method according to claim 21, in which the method 
35 is used to identify potential compounds which have the ability to decrease an 

activity mediated by the receptor. 
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23. A computer-assisted method according to claim 21 or claim 22, which 
further includes the step of selecting one or more chemical structures from step 
(e) which interact with the receptor site of the molecule in a manner which 

5 prevents the binding of natural ligands to the receptor site. 

24. A computer-assisted method according to any one of claims 21 to 23, 
which further includes the step of obtaining a compound with a chemical 
structure selected in steps (d) and (e), and testing the compound for the ability to 

10 decrease an activity mediated by the receptor. 

25. A computer-assisted method according to claim 21 , in which the method 
is used to identify potential compounds which have the ability to increase an 
activity mediated by the receptor molecule. 



15 



20 



30 



26. 



A computer-assisted method according to claim 25, further including the 
step of obtaining a molecule with a chemical structure selected in steps (dj and 
ie], and testing the compound for the ability to increase an activity mediated by 
the receptor. 

27. A computer-assisted method according to any one of claims 21 to 26. in 
which the receptor is the IGF-IR. 

28. A computer-assisted method according to any one of claims 21 to 26, in 
25 which the receptor is the insulin receptor. 

29. A method of screening of a putative compound having the ability to 
modulate the activity of a receptor of the insulin receptor family, including the 



steps of identifying a putative compound by a method according to any one of 
claims 1 to 29, and testing the compound for the ability to increase or decrease 
an activity mediated by the receptor. 



30. 



35 31. 



A method according to claim 29, in which tlie test is carried out in vitro. 



A method according to claim 29, in which the test is a high throughput 
assay. 
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32. A method according to claim 29, in which the test is carried out in vivo. 

33. A method according to claim 30, in which the test is carried out in viva. 

5 
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Figure 1 
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Figure 5 
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Figure 9: Sequence Alignment of hIGF-lR. hIR and hIRR ectodomains. 

Gln^tS.^r °! ^""^P P'°Sram in the software package of the 
Genetics Computer Group. 575 Science Drive, Madison. Wisconsin. USA. 

symbol comparison table: GenRunData : PileUpPep . Qnp CompCheCk: 1254 

GapWeight : 3.0 
GapLengthWeight : 0.1 



Na^" Check: 1781 Weight 

Name. H^rr : 972 Check: 9819 Weight 



1.00 
1.00 
1.00 



"^'Jfr ut";^;!^,^^ GIDIWfDYQQ LKRLENCTVI EGYLHILLIS K. .AEDYRSY 43 
Hir HLYPGEVC.P GMDIRNNLTR LHELeFJHvI EGHLQILLMF KTRP^DfSI 4 9 
Hxrr . . . .MNVC.P SLDIRSEVAE LROLEnHw EGHLQILmr tSSdfS ^5 

"^^H^r ??PK™Jn ^^^^f«VAGL ESLGDLFPNL TVIRGWKLFY NYALVIFEMT 93 
Hirr SFPRlS"S Itffll^^r^ ESLKDLFPnTTvIRGSRLFF kSJSJ 99 
SFPRLTQVTD YLLLFRVYGL ESLRDLFPNL AVIRGTRLFL GYALVIFEMP 95 

"'"H^r HlSJ^I!' SF^^'^'''' H:KNADLiYLS TVDWSLILDA VSNNYIVGNK 143 
■ HiJ- HTRm/nr^^ ^mTRGSVRI EKNNELCYLA TIDWSRILDS VEDNYIVLNK 149 
Hxr. HLRDVALPAL GAVLRGAVRV EKNQELCHLS TIDWGLLQPA PGAJ^IH^VGNK ^45 

"^^H^"" llLl'^'''''' PGTMEEKPM. CEKTTINNEY NYRCWTTNRC QKMCPSTCGK 191 
Hxr DDNEECGDICPGTAKGKTN. CPATVINGQF VERCWTHSHC qS^S^ISs 
H.rr LG.EECADVC PGVLGAAGEP C^TFSGHT DYRCWTSSHC QRvZc^m 193 

***** + 

Higflr RACTENNECC HPECLGSCSA PDNDTA^'AC RHYYYAGV^ PA^PNTYRF 241 

."^i =s ~i ™- ssii Hi 
"lii Sil^ EE = ^ 

GASLHSVPG RASTFG IHQGSCLAQC PSGFTRNSS. 287 

* * * * 

"^^J?r 'Tf^l^"'' '^'^'^^EEKK TKTIDSVTSA QMLQGCTIFK GNLLINIRRG 337 
Hiir .'fSkSgL S!^"'''^^ EKTIDSVTSA QELRGCTVIN G5LI MIRGG 47 
-.FCHKCEGL CPKECKV..G TKTIDSIQAZV QDLVGCTHvi-^LILNLRQG 335 

"""'itl "^^i^ LGl'J^^g"' lilT.'^Z 3LSFI,KNLR. XLGEEQLEGM 387 

Hi.. Y^aEP...„3 J^SJ?- ---5^;^ J?G™| 3%^^ 

Hi. ™- |g™ ~- ex™ 
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* !End of 1-462 fragment 

Higflr TKGRQSKGDI NTRNNGERAS CESDV LHFTS TTTSKNRIII TWHRYRPPDY 487 

Hir TKGRQERNDI ALKTNGDQAS CENEL LKFSY IRTSFDKILL RWEPYWPPDF 4 97 

Hirr TRGRQNKAEI NPRTNGDRAA CQTRT LRFVS NVTEADRILL RWERYEPLEA 4 85 

Higflr RDLISFTVYY KEAPFK NVT E YDGQDACGSN SWNMVDVDLP PNKDV 532 

Hir RDLLGFMLFY KEAPYQ NVT E FDGQDACGSM SWTWDIDPP LRSNDPKSQN 547 
Hirr RDLLSFIVYY KESPFQ NAT E HVGPDACGTQ SWNLLDVELP L SRTQ 530 

Higflr EPGILLHGLK PWTQYAVYVK AVTLTMVEND HIRGAKSEIL YIRTNASVPS 582 
Hir HPGWLMRGLK PWTQYAIFVK TL.VTFSDER RTYGAKSDII YVQTDATNPS 596 
Hirr EPGVTLASLK PWTQYAVFVR AITLTTEEDS PHQGAQSPIV YLRTLPAAPT 580 



Higflr IPLDVLSAS N SS SQLIVKWN PP5LPNG NL5 YYIVRWQRQP QDGYLYRHNY 632 
Hir VPLDPISVS N SS SQIILKWK PP5DPNG NIT HYLVFWERQA EDSELFELDY 646 
Hirr VPQDVISTS N SS SHLLVRWK PPTQRNGNLT YYLVLWQRLA EDGDLYLNDY 630 

* ****** 

Higflr CSKD.KIPIR KYADGTIDIE EVTENPKTEV CGGEKGPCCA C. . . PKTEAE 678 

Hir CLKGLKLPSR TWS.PPFESE DSQKHNQSE. YEDSAGECCS C. . . PKTDSQ 691 

Hirr CHRGLRLPTS N.NDPRFDGE DGDPEAEME SDCCP CQHPPPGQVL 673 

a >< P 

Higflr KQAEKEEAEY RKVFENFLHN SIFVPRPERK RRDVMQV ANT T MS5RS RNTT 728 

Hir ILKELEESSF RKTFEDYLHN WFVPRPSRK RRSLGDVG NV T VAV? . . .TV 738 

Hirr PPLEAQEASF QKKFENFLHM AITIPISPWK VT5INKSPQR D.SGRHRRAA 722 

it- 

Higflr AA. . DTY NIT DPEELETEYP FFESRVDNKE RTVISNLRPF TLYRIDIHSC 776 
Hir AAFP NTS ST5 VPTSPEEHRP F..EKVVNKE SLVISGLRHF TGYRIELQAC 786 
Hirr GPLRLGGNSS DFEIQEDKVP RE RAVLSGLRHF TEYRIDIHAC 764 

* 

Higflr NHEAEKLGCS ASNFVFARTM PAEGADDIPG PVTWEPRPEN SIFLKWPEPE 826 

Hir NQDTPEERCS VAAYVSARTM PEAKADDIVG PVTHEIFENN WHLMWQEPK 836 

Hirr MHAAHTVGCS AATFVFARTM PHREADGIPG KVAWEASSKN SVLLRWLEPP 814 

* * 

Higflr MPNGLILMYE IKYGS.QVED QRECVSRQEY RKYGGAKLNR LNPGNYTARI 875 
Hir EPNGLIVLYE VSYRRYGDEE LHLCVSRKHF ALERGCRLRG LSPGNYSVRI 886 
Hirr DPNGLILKYE IKYRRLGEEA TVLCVSRLRY AKFGGVHLAL LPPGNY5ARV 864 

Higflr QATSLSGNGS WTDPVFFYVQ AKTGYENFIH L 906 
Hir RATSLAGNGS WTEPTYFYVT DYLDVPSNIA K 917 
Hirr RATSLAG NGS WTDSVAFYIL GPEEEDAGGL H 895 
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Figure 10 
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Schematic interpretations of EM images 
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The study of human transforming growth factor-a 
(TGF-a) in complex with the epidermal growth factor 
(EGF) receptor extracellular domain has been under- 
taken in order to generate information on the interac- 
tions of these molecules. Analysis of 'H NMR transferred 
nuclear Overhauser enhancement data for titration of 
the ligand with the receptor has yielded specific data on 
the residues of the growth factor involved in contact 
with the larger protein. Significant increases and de- 
creases in nuclear Overhauser enhancement cross-peak 
intensity occur upon complexation, and interpretation 
of these changes indicates that residues of the A- and 
C-loops of TGF-ce form the major binding interface, 
while the B-loop provides a structural scaffold for this 
site. These results corroborate the conclusions from 
NMR relaxation studies (Hoyt, O. W., Harkins, R. N., 
Debanne, M. T., 0*Connor-Mc Court, M., and Sykes, B. D. 
(1994) Biochemistry 33. 15283-15292), which suggest that 
the C-terminal residues of the polypeptide are immobi- 
lized upon receptor binding, while the N terminus of the 
molecule retains considerable flexibility, and are con- 
sistent with structure-function studies of the TGF-a/ 
EGF system indicating a multidomain binding model. 
These results give a visualization, for the first time, of 
native TGF-a in complex with the EGF receptor and 
generate a picture of the ligand-binding site based upon 
the intact molecule. This will undoubtedly be of utility 
in the structure-based desigo of TGF-a/EGF agonists 
and/or antagonists. 



Human TGF-a^ is a 50-amino acid polypeptide with 400fc 
sequence homology to epidermal growth factor (1, 2). In addi- 
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tion, the structural similarity of the two molecules results in 
their ability to compete for binding to the EGF receptor (3-5). 
Complexation of TGF-a with this receptor is believed to medi- 
ate a variety of biological effects, including embryonic develop- 
ment of certain tissues and wound healing (6, 7); however, the 
major sphere of interest of this protein lies in its role in the 
transformation and maintenance of various malignant tumors 
(8, 9). 

The structural features of the homologous growth factors 
that contain three disulfides and hence three loops (A, B, and 
C) have been determined by NMR (10-13) and include a triple- 
stranded anti-parallel 0-shcet comprising the N-terminal re- 
gion, a smaller anti-parallel double hairpin in the C terminus of 
the molecule, and a helical segment in the A-loop in some 
structures. Previous studies undertaken to elucidate the struc- 
tiu-ally important residues of TGF-a (and EGF) required for 
complexation have implicated residues including Phe-15. Tyr- 
38, Arg-42. and Leu-48 (14-18). The consensus of a variety of 
structural studies including the use of synthetic peptide frag- 
ments (19-21), recombinant chimeric proteins (22, 23), and 
anti-TGF-a and anti-EGF antibodies (24, 25) is that receptor 
binding occurs with multiple domains of TGF-a, although con- 
flicting results have been obtained concerning the involvement 
of the A-, and C-loops. The multidomain binding model is 
consistent with the observation that, at present, it is not pos- 
sible to reduce the size of the growth factor without signifi- 
cantly compromising its affinity for the EGF receptor. This was 
illustrated by deletion studies where the N-terminal residues 
outside the A-loop were truncated. This mutant had 3% of the 
binding affinity of the intact protein (20). Further data from 
receptor-bound TGF-a may lead to the structure-based design 
of reductant molecules through the precise identification of 
ligand binding determinants. 

Hoyt et aL (26) have recently demonstrated, through a study 
of^H NMR transverse and longitudinal relaxation rates for the 
methyl resonances of TGF-a in the free state and in association 
with the EGFR-ED, that the C-terminal residues undergo a 
dramatic decrease in flexibility upon binding, while the N 
terminus maintains a degree of mobility similar in both bound 
and free forms of the ligand. The mid-portion of the molecule 
underwent a moderate decrease in flexibility relative to the 
uncomplexed polypeptide. The conclusions of this work are 
consistent with the previous studies, which suggest that the 
C-terminal residues are responsible for receptor binding, while 
the B-loop provides a structural scaffold for the primary site of 
interaction. 

This study presents new insight into the components of the 
TGF-a structiure that are requisite for complex formation from 

This paper is available on line at http://w%vw-ibc. stanford.edu/ibc/ 
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the analysis of changes in the two-dimensional 'H NMR (26) and th<.r»f™ .k^ 

NOESY spectra oftheligand upon titraUca with the EGFR-ED tu^v TO-NOE^^^^ conditions amenable to 

(molecular mass of 85 kDa). NMR is the method of choice in the ^sonances o^TCF „ nH^ n K ^!f' P™»°° NMR 

eluc.daUon of the ligand contact sites with the receptor since it f^fir»„T Z ^ . ° P'-«vi°"s'y reported 

is the only technique that can give detailed Trrmluon on eSe^of TO^OESY T %^T'"^ cross-peaks for a 

these molecules in solution. The results suggest the involve- ofT 2 4 . Q TJlf ™ IS*^ " ^""^ ''° '''^ P^^«^"~ 

ment of residues in the A- and Cloops and C-terminal tai fn t t of a totkl^f 544 ! ^"P ''T/*'^ '^" -"^ ^^^^ 

the pnmai7 site of interaction of the growth factor with its L ^ross-peaks .n the free ligand, 419 were 

receptor. 'a^or wim its assigned unambiguously. Of the 419 unambiguous cross-peaks 

404 were assigned from the two-dimensional NOESY spectrum' 

EXPERIMENTAL PROCEDURES and 15 were additionally assigned from the three-diinensional 

Expression and Purification of TCF-^-TCF-a and the EGFR-ED "MQC-NOESY spectrum as the latter were able to be resolved 

"^J^Jff ^ """"^ according to previous methods (26). the '*N dimension from the three-dimensional spectrum 

«H^^^«n°Tr/^'"''°'"'"-^'* ^ '-"ophiJ^ed saniple of TGF-o was After assignment of the TR-NOESY ero.,« nl=t! r 

added 460 Ml of buffer conwinlng 50 mMuotassiumuhosDhate 10 mM PfJPR Pn fi^ .• in iNuctiy cross-peaks for the 

potassiun^ chloride. 1 mM EOTA. 0.5 mM s^ium azTde 0 IS^ s^Z n^I^'^F °u ^'"f ' " ''^'^^ apparent that very few 

2.2^imethyl-2-silapentane.5-sulfonaU (internal standard ). 99 ^ ^^^^ ^P^^^*- Due to the absence of 

D,0 or 90/10% (v/v) H,0/D,0. The solution was adjusted to pH 6 C) by "^"^ cross-peaks or observable chemical shift differences be- 

^tiz^^^:^:^^^ri^:^:^^ S^^^ctyitrginlro^rTcrr; '•^^^ t 

^'^^zt^i^^-^-::^^^^^ H-T'h^ ^^^^^^^^^^^^^^^^^^ 

4.9. and 6.5% EGFR-ED, P ^ ^ve concentrations of 0. 2.4. for each point of the TGF-ayT;GFR-ED titration revealed spe 

NMR Spectroscopy-^H NMR spectra for TGF-a free in solution "5*' ''^^^^^^^ °° sites of interaction of the growth factor 

P^sence of various amounts of EGFR-ED were acquired at This was achieved by ranking the cross-peaks by the slope of 

ont^\TfU ' H^"^ ^"^7 ^^P-^^n^et^- These included ^he NOE intensity changes during the receptor titration s^es 

one-, two., and three-dimensional experiments coUected at 298 K. ref- For each spectrum in the series thp MOP "^^^^^^.^^^s. 

erenced relauve to an internal sodium 2.2-dimethyl-2-silapentaAr5- diviH**H hv fK^ f ' volume is first 

sulfonate standard and utilising presaturalion to at^if s^pp^^rff t^^.ft^l ^L J'TL accurately 

the water resonance. The hj-percomplex method (27) was used for 1.^x1^^^^^° spectra. For the addition of 2.4% 

^'?Tin'*f°.^'*^'™^''^'°"^^^*^^SY spectra (28-3^^^ EGFR-ED, 524 NOEs were observed in the 150-ms NOESY 

8000 Hz and 2048 data poinrxhe Fourier tra^sfo^^or of U^e J^^^^IJ??"^' disappeared. For the final all- 

spectra utdized shifted sinebeU and zen> fdling to 4096 points i^ boSh Tli ^ ^^t" 434 NOEs were observed, and thus a 

Sf'^p^""*" . three-dimensional «N edited NOESY (HMQC- "^^r 42 were absent in this spectrum. For the peaks that 

Dmto„r/^l'^°"°"'^"'^"^'"^^"'^e time of 150 ms with t^^ remained over the course of the titration 186 increased in 

widths of 8000. 4000. and 1500 Hzfor the "H. NH, ^d^N riTensro^ 5™'d^^^^"»t.^;o°°«=f vibes of the TR-NOESY spectra of 
respectively, with 128 (tl), 32 (t2). and 512 (t3) complex points and eichi 7 ^ presence of 6.5% EGFR-ED 

sca^ per mcrement. Processing of the FID was perfomed using Uie these spectra that the majority of 

^^^r^^'^r^jr-T^,J^/.\^7lr^^^^ S^t^rSg^LT"^ ^^"^^'^ - """•^'^ -"p-^^ 

LTt.^'^L^i^^^rSf^^Ll-^'^''^'-- th'^op'^h'?;""" °f ^'-i-^ating the receptor conUct sites. 

NOEs were assigned based on the complete resonance assignment '^f^^^^^ ^SO-ms NOESY spectra over 

previously reported for TGF-a at pH 6.0 (26). and their volu^was - ""JJ! titration were examined. Cross-peak intensity 

mtegratedusmg a combination ofthe NMR processing and autoassign- »" TR-NOESY spectra is determined by a large number of 

P^yUto^^TcT^"^*^'- ?.*'VN^«(VNMR5.1A.VarianAssociaS^ fectors, including relaxation in the free and^ounruJ^d 

r^i^;„r^» ^ " ^ mtensity of the NOEs over the coui^e of exchange rate. fracUon bound and NOE«?Y m.>H^» 2 o ' 

the receptor titrauon were analyzed using the Drooram SPEAK fR«iw.rt I. • »u «• '^'-"iai mixing time. Re- 

Bc-yko. University of Alberta). tL progrL AS^e NOE^ort^rf T.^' free hgajid is determined by intemuclear dis- 

themcreaseordecreaseintheirvolumelscaledbytheaverageatensity u-f , rotational correlation times of the free ligand 

spectrum) as a percentage of the volume for the corresponding felaxation in the bound ligand is influenced by intemu-' 

cross-peak .n the free hgand. '^'^ f^'^^'^' ">^ti°r.^ correlation tiu^e of O^Lom^^ 

RESULTS additional relaxation pathways involving contact between 

Two-dimensional 'H NMR NOESY spectra of TGF-„ free in w^'^rUgtd ^ m^f^/^xiS: p1X?:^lttr;° 'Tn^' 
solution and in comolex with thf* PPP * j . ■ .^l l pepude, the intrinsic NOEs 

receptor molecule is studied in excess (usually 10 1 to 20 1) over as the frachoA nf li-T ^^f^Y intensities generally increase 

the receptor^During the course of the NOESY experime^lX « ^ co't^t Ss^^t'T'"' 

bound hgand magneUzation is transferred to the excess &*e a^e^L NOeIy rii^nff ""V ^ *m»e or increase with in- 

hgand through chemical exchange, and thus, the NMR^nfor! Uon b^Ld (L 3^) ° l^u """^"^ 

maUon characterizing the bound structure is observed via thp oh-^J^ k ' ^^^^ ""^^ ^''^'^e limits are 

sharp resonances of the free ligand. At p^G 0. TO^alsTn t Sn Tnte^f h"""" ^*"T°' 

exchange with the EGFR-ED and has an off-rate of -3^ ThJ tl^T """T'' complicated 

an on rate ot -600 s when the ligand is also a protein such as TGF-o for which 
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Fig. 1. TR-NOESY spectra of free TGF-a (A) and in the presence of 6.5% EGFR-ED (B) aquired at 600 MHz at 25 "C with a mixing 
time of 150 ms, illustrating the amide and aromatic through space connectivities of the ligand. 



significant NOESY cross-peaks already exist and where differ- 
ential mobility exists in both the free and bound states, as is 
also the case with TGF-a (26). Thus, one can expect TR-NOESY 
cross-peaks to both increase and decrease. A priori, one would 
expect shorter bound intemuclear distance and increased cor- 
relation time to increase TR-NOESY cross-peak intensity in 
the limit of small fraction bound and short NOESY mixing 
times, but the effect can be attenuated by rapid cross-relax- 
ation. Longer distances and additional relaxation pathways 
involving ligand protons would lead to a decrease in cross-peak 
intensity at small fraction bound and short mixing times. 

To quantitate the effects of the competing influences in TR- 
NOESY cross-peak intensity, we have simulated spectra with 
estimates of the free and bound correlation times (3 and 40 ns, 
respectively) and exchange rates (500 s"*) using a program 
developed for Mathematica and kindly provided by R. London 
(35) and subsequently modified by one of us (B. D. S.). The 
simulations indicate that the change in the TR-NOESY inten- 
sity is approximately linear in fraction bound for the values of 
fraction bound and mixing time used in this study, as is ob- 
served experimentally. Increases in intensity expected from the 
increased bound rotational correlation time are slight at the 
fraction bound used. The most striking effects come when ad- 
ditional relELxation pathways are considered that lead to a 
decrease in cross-peak intensity. 

Therefore^ it was decided that the most unambiguous TR- 
NOESY cross-peaks to focus on were those that decrease, pos- 
sibly caused by increased intemuclear distance, but most likely 
as a result of increased relaxation caused by spin-diffusion 
contact with protons on the target protein. Although there is an 
overall decrease in NOE intensity throughout the TGF-a mol- 
ecule, corresponding increases in intensity of a number of 
cross-peaks are observed (one- third of the total number of 
NOEs are more intense in the TGF-crEGFR-ED complex). 



Since there are a number of pathways by which the intensity 
increases can occur, their interpretation is not straightforward; 
however, since it is probable that magnetization bleed -off oc- 
curs from ligand contact sites, it can be surmised that the 
atoms involved in NOEs that increase are those that either are 
non-surface protons or are not in contact with the EGF recep- 
tor, which typically experience the largest perturbations upon 
binding. 

In addition to ligand magnetization loss to receptor protons, 
exchange broadening may play a significant role in the de- 
creased NOE intensities. The interpretation of absent NOEs in 
the receptor-bound ligand is not affected in this case since 
these effects would be most pronounced for the residues and 
atoms in contact with the receptor. 

For the case of the complexation of TGF-a with the EGF 
receptor, in which the changes in the NOE intensity from the 
series of NOESY spectra were examined, the most striking 
changes were in the number of NOEs that disappeared over the 
course of the receptor titration. Of a total of 544 NOEs in the 
uncomplexed form of TGF-a, 110 NOEs were no longer present 
in the bound polypeptide (i.e. the final receptor titration point). 
As discussed, the disappearance of these peaks most probably 
occurs as a result of magnetization bleed-off from protons of the 
ligand that are in direct contact with the receptor. 

From the changes in the NOE intensity upon receptor addi- 
tion, an understanding of the molecular nature of the interac- 
tions of the TGF-o-EGFR-ED complex can thus be deduced. As 
mentioned, of particular significance are the NOEs that are 
absent firom the NOESY spectra of bound TGF-a. Fig. 2 illus- 
trates a Connolly surface representation of the TGF-a struc- 
ture and shows the atoms involved in NOEs that disappear in 
the NOESY spectrum of the TGF-a-6.5% EGFR-ED complex. 
From Fig. 2, the observation can be made that almost the whole 
surface of the polypeptide has atoms that lose cross-peaks, with 
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Pig 2. Connolly surface represen- 
tation of the TGF-a structure. The at- 
oms involved in NOEs that disappear 
upon receptor binding are colored in nd 
Two faces of the molecule are shown (180' 
rotation) along with a ribbon diagram 
(corresponding to the left face) iiJustrat- 
mg the loops and secondary structure 
The tiiree disulfide loops are labeled in 
the nbbon diagram. 




P-sheet_ The face of the molecule displayed on the right of Fig 
inr^MOP^^^^r* concentration of atoms involved in the ab^ 
sent NOEs and for this face, the residues forming the C-loop 

DUtftiT". "V"^' ™^ -P^sentation indicates^e 

^^T^! "f.-^onte^tand binding on the ligand and suggests 

inf^^f- °f visuaUzation of the receptor binding 

interactions .s useful, it is insufficient to obtain a statistic J 

x^vo*;^:;'''' "^^^i*'' '^''""'^ ^ p-*^-'- -^^^^t 

involved m a number of absent NOEs. To address this the 

mred" ''•-PP^--^ -ch residue w^dUer! 

mined, and the results are expressed as a percentage of NOEs 

nlberof NOP * ".'^"^ "P°° compfexation ' «^ j^^he 
number of NOEs per residue in the free ligand. Fie 3 LA and R) 
shows the total number of NOEs per resfdue and the^eTcenl 
a^f absent NOEs in the receptor-bound ligand. respe^ctiv^y 

S^~h . ^"'^ ^^^'^^e inform^ation on 

the changes that are concomitant with receptor binding First 

iZ T 'Tk' ^^"^ ««> NOEs ti,at disapS' 

°f immobilized by the receptor !„: 

T^Tn ^: '^''^ r'lT TGF-« structure 

at the N-termmal subdomain embodying part of the trinle 
stranded^sheet. The other noticeable fea^tures of thelraX 
fr-S ^ ^Xl^"^"^^ ^^'''^ ''^t^een 40 and 60% of the 

C^l^ t T^c^ segments of the A- and C-loops and the 
Sal ^ "^^Z-"- ^'enificant are those aLino acids 

Eation If th"" ° °' ^"'^-Pe^ks upon com- 

plexation If these are considered, almost all of the A-loon 
readues (from Ser-ll to Uu-24) are afTected ^ 
When the results shown in Fig. 3fi are grouped into classes 
by percentage and visualized on the TCP „ stricture further 
iTu'st^J^s tl^ mA^'^™^^''^' interactions is facilita^i F g. 4 
S^^-. ^^^^^^ *he surface of Tcl-a 

rJli-? t structure of the growth factor. The im- 

med,au obsenration from Fig. 4 is the localization of the dTf. 

f^tt *° '^^^ »f ^'^"t NOEs) are clust^ed 

for the most part, in the N-tenninal subdomain of the urand' 
The u,h,te colors (indicating 20-30* of NOEs that cCsappeS 



tiZlLT^ ^ receptor-bound state (shown in red! are 

face of TCF^":^""'?' ""''"^ constitutes 0"^ 

lace of TGF-a TTus face consists of residues in the A- and 
C-loops and the C-terminal tail of the ligand This face aTso 
includes Glu-44. which loses 30-40% of NOEs upon Wnd^ng 
hT. fu^lT'* "^^-^2, Phe.l5. Phe.l7. V^.39 G°y-40 
and H1S.45 (which lose 40-60% of NOEs) on tAe surface 0^ the 
. ^n^"'V"u'** ^"'^ °° "-^^ C-terminal tail of TGF ^ 
both lose 50% of their NOEs. indicating that these residue^ 

S:eV.rrn5r "^^^ 

int^r*"/"^*?"" °f intramolecular 

intermolecular. and long-range (,./+2 or further) NOEs th^t 

ct^^nX fr r'" T^' ''''' '^""^ distiibution as oc- 

ngana. ^9, 36, and 39% of the inti-amolecular, intermolecular 

i ■ ff^Pf t^'^'y- This suggests that the structure is 

NOE IS not greatly more affected than another. If the case were 
considered where a significantiy higher percentage (relative [0 
the free hgand) of mtermolecular rather than intramolecul^ 
th.fl ^''fP'"^^ "PO" binding, then this would sugS 
Son^^e"^^' P--bations rather than complexation'vfet 

rec^p^"; biid^nrttl ' P"^"" information on 

receptor binding that augments that previously discussed Fie 
5 shows the number of NOEs that disappear upon addition of 
LnseNOEfr; "''"k"" P"«-'-- 'ote J. the most in 

CFig- 4) and thus predicate the implied binding interface. 



DISCUSSION 

The NOE anaJysis method for elucidating tho ^ 
tact Sites of the Hgand is discuLedlrK/r S^^r^"; 
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Fig. 3. Ptot of the total number of 
NOEs per residue in free TGF-a (A) 
and the percentage of the total num- 
ber of NOEs per residue that are ab> 
sent in the receptor-bound form of 
the ligand (B). 




understanding of the molecular requirements for TGF-a com- 
plex formation with the EGF receptor. A plethora of studies 
have been undertaken to determine the components of the 
TGF-a/EGF system that are requisite for receptor binding and 
activation; however, these as a whole have failed to provide a 
consensus as to the essential residues and regions (19-25). It 
has been established that a correct native fold of both growth 
factors is critical for biological activity since the disruption of 
any single disulfide results in complete loss of binding (14). 
Deletion studies using synthetic peptides indicate that virtu- 
ally no part of these polypeptides can be removed without a 
significant decrease in activity. Even removal of the flexible 
N-terminaJ tail in the case of TGF-a yields an analog with only 
3% of the binding affinity of the native molecule (20). 

Studies of EGF and TGF-a mutants in which single amino 
acids are replaced in a conservative or nonconservative fashion 
have, for the most part, produced conflicting results. Despite 
this» the critical role of certain residues in receptor binding and 
activation has emerged, including Phe-15, Tyr-38, Arg-42, and 
Leu-48 (14-18). 

The resulte obtained by the NOE analysis method support a 
mulUdomain model for receptor binding as postulated (19-25) 
and shown in Fig. 4. The majority of the most important resi- 
dues for EGF receptor complex formation as suggested by the 



residues that lose the highest percentage of NOEs are pre- 
sented on a common face of TGF-a that comprises the A- and 
C-loops and consists of His-12, Phe-15, Phe-17, Val-39, Gly-40, 
and His-45, thus strongly impUcating this face as a binding 
determinant for the ligand/receptor interaction. This postula- 
tion is corroborated by a previous study that demonstrated that 
antibodies specific for an epitope on the opposite face, consist- 
ing of the residues of the B-loop, were non -neutralizing in 
terms of receptor binding and thus proposed that the face of 
TGF-a including residues 12-20 and 34-43 was involved in 
binding (25). When the residues for which 30-40% of the NOEs 
were absent in the bound ligand were included in the face, a 
more extensive contiguous surface for receptor binding was 
apparent. This surface now includes two of the critical residues 
(Arg-42 and Phe-15) that lose 31 and 46% of their NOEs, 
respectively. Arg.42, which appears to be less important for 
binding based on the NOE criterion, may play a structural role 
in preserving the local conformation of Phe-15, which is a 
critical residue based upon the NOE data. Structural studies of 
the inactive R42K mutant exclude any gross conformational 
changes; however, do not rule out subtle effects that alter the 
microenvironment of the phenylalanine (17, 36). 

A recent study on a chimeric growth factor consisting of the 
A- and C-loops of EGF and the B-loop of TGF-o concluded that, ' 
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TGF-alpha ■ 
% of WOE'S which disappear ' " ' ' 

upon addition of EGF receotor 20-30 % 




Fig. 4. Connolly surface represen- 
tation of TGF-a iUustrating the pcr- 
centagfe of absent NOEs per residue 
souped into classes. 0%, blue- 0-20% 
purpU; 20-30%, whUe; 30-40% Preen' 
and 40-60%, red. 
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Since the hybrid protein exhibited enhanced binding to the 

n^t for receptor binding and mitogenic acUvity (22) This 
dZ« " due to the multiplicity of bLd 

« ?Jn T'T- r ""^"^^ 'he B-loop plays 

^~Zn2- r^."'' providing a molecular scaffold for 
presentauon of the A- and C-loops to the receptor. In the case 
of chnnenc prjtein. the enhanced acUvity J™bably resuTu 

re^^ ™ » ■ ^™ the same group 

TO^^a^lf^"' "'""^"^ " ""^t^ined B-loop ana^g of 
TGP-o (37); however, .t was very low relative to the nafive 



molecule and thus does not eliminate the role of A- and C-loon 
res.dues s.nce the latter have been shown to contribute siS 
.cantly to bmd.ng and activation of the receptor (14-11) 

u ^PP"«°t that leucines 48 and 49 are 
.mmobihzed in the receptor-bound Ugand; however it is al^o 

o^s rmen'ti' '"''"^^ ^ ^^t^e "rt" 

ous y menboned receptor interface. This observation impUes 
that these res.dues provide a second interface that is iSa! 

Inal7gTTGr''(ll)'°™"^°°- '""'^'^"'^ "^fsA 
analog of TGF-or (18) corroborates the role of this face as a 
second anchor point to the EGF receptor 

Site-directed mutagenesis of Tyr-38 in TOF-a and Tyr-37 in 
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Pic. 6. Comparison of the data ob- 
tained from measurement of re- 
laxation rates iVTj^) versus data from 
the NOE analysis displayed by color 
on the ribbon diagram of TGF-a. The 
differential mobility as suggested by the 
data is colored as follows: high mobility, 
blue; interm**dinte. green; low mobility, 
red; and no data, white. 



EGF apparently gives conflicting results for the significance of 
this residue in the receptor interaction since it was shown to be 
nonessential in EGF (38) and essential in TGF-a (16). This 
conflict may be the consequence of a different mechanism or 
site of binding with the two ligands; however, our NOE data 
indicate that if mutation of this residue precludes binding of 
TGF-a, then it does so by altering the ligand conformation and 
thus probably does not contribute directly to the interaction. 

If the NOEs that disappear upon addition of the first recep- 
tor aliquot are considered, then further insight into the recep- 
tor contacts is obtained. Of special note in the consideration of 
these NOEs are those of the greatest intensity that are absent 
in the spectrum of the first point of the titration. It can be 
envisaged that these NOEs belong to residues of TGF-a bound 
most tightly to the EGFR-ED. These residues include His- 12, 
Phe-15, Phe-17, Phe-23, Val.39, His-45, and Leu-49 and, for the 
most part, predicate the postulated binding faces and the con- 
tentions of the analog studies. 

It can be observed from Fig. 4 that the green residues (30- 
40% absent NOEs) are clustered on the view of TGF-a on the 
right. It is tempting to suggest that this face forms a binding 
interface for a second EGF receptor molecule and is of lower 
affinity than the face illustrated on the left Connolly surface 
representation in Fig. 4. Obviously, further experiments would 
be required to confirm this, although it has been proposed that 
the TGF-a/EGF/EGFR system is similar to that of human 
growth hormone, where one hormone binds two receptor mol- 
ecules and some evidence for this mechanism has been dis- 
cussed (39). 

Hoyt et al. (26) have recently published a detailed study of 
methyl relaxation rates of TGF-a both free in solution and in 
complex with the receptor. Since these results and the current 
study were performed under the same conditions, congruity is 
to be expected between the two methods of analyzing the li- 
gand/receptor interactions. Comparison of the transverse re- 
laxation rates for all of the methvl -containing residues of 
TGF-a was used to delineate the relative mobilities of these 
residues within the TGF-a structure. Val-1 and Val-2 showed 
the highest mobility when receptor bound with virtually no 
enhancement of upon receptor binding. Since these two 
residues are flexible in both the bound and free forms, they are 
largely devoid of NOEs. and thus, the NOE com pari SOD with 
the relaxation data cannot be made. Thr-13, Thr-20, Leu-24, 
Val-25, and Ala-dl all undergo intermediate loss of flexibility 



as determined from the values. This compares favorably 
with the NOE data (Fig. 6), which suggest, with the exception 
of Val-25, that these residues are in contact with the receptor. 
Val-25, which is part of the /3-turn of the /3-sheet in the B-loop, 
does not lose any NOEs upon receptor binding, thus suggesting 
that this residue is not involved in receptor binding. Its methyl 
relaxation, however, indicates that its mobility is somewhat 
restricted upon complexation, A possible explanation is that its 
mobility is restricted compared with the very flexible N-termi- 
nal tail, but it is still flexible enough that the NOEs of this 
residue, which may not be in direct contact with the receptor, 
thus do not disappear due to magnetization bleed-off. The 
residues of the C-loop and C-terminal tail show strong agree- 
ment between the relaxation and NOE data as these have both 
the largest relaxation enhancements and the largest percent- 
age of NOEs that disappear upon complex formation. 

The results of this study demonstrate that the NOE analysis 
method provides a model that explicates the current under- 
standing of TGF-a/EGF interactions with the EGF receptor in 
terms of a multidomain model and provides significant infor- 
mation on the residues contributing to binding and activation. 

CONCLUSIONS 

Detailed analysis of the NOEs for free and bound species of 
TGF-a indicates that the majority of residues of the ligand that 
have the highest percentage of absent NOEs in the bound form 
embody one face of the molecule that is composed of the A- and 
C-loops. A second receptor anchor point is formed by the two 
C-terminal leucines. The NOE analysis results are consistent 
with relaxation studies that indicate restricted C-terminai mo- 
bility in bound TGF-a and with structure -function studies that 
suggest a multidomain ligand binding model. The elucidation 
of the ligand interaction sites is essential for the future devel- 
opment of TGF-a agonists and/or antagonists using structure- 
based drug design methodology. 
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ABSTRACT We evaluate homology-de- 
rived 3D models of dihydrofolate reductase 
(DFRi), phosphotransferase enzyme IIA do- 
main (PTE2A3), and mouse/human UBC9 pro- 
tein (UBC924) which were submitted to the 
second Meeting on the Critical Assessment of 
Techniques for Protein Structure Prediction 
(CASP). The DFRi and PTE2A3 models, based 
on alignments without large errors, were 
slightly closer to their corresponding X-ray 
structures than the closest template struc- 
tures. By contrast, the UBC924 model was 
slightly worse than the best template due to a 
misalignment of the N-terminal helix. Although 
the current models appear to be more accurate 
than the models submitted to the CASP meet- 
ing in 1994, the four msgor types of errors in 
side chain packing, position, and conformation 
of aligned segments, position and conforma- 
tion of inserted segments, and in alignment 
still occur to almost the same degree. The 
modest improvement probably originates from 
the careful manual selection of the templates 
and editing of the alignment, as well as from 
the iterative realignment and model building 
guided by various model evaluation tech- 
niques. This iterative approach to comparative 
modeling is likely to overcome at least some 
initial alignment errors, as demonstrated by 
the correct final alignment of the C terminus of 
DFRi. Proteins, Suppl. 1:50-58, 1997. 
© 1998 Wiley-Liss, Inc. 

Key w^ords: evaluation; comparative protein 
modeling; Modeller 

INTRODUCTION 

Protein modelers were challenged for the second 
time to model sequences without available 3D struc- 
tures and to submit them to the CASP meeting in 
December 1996 (CASP; URL http://PredictionCenter. 
Hnl.gov/). At the same time, the 3D structures were 
being determined by X-ray crystallography and NMR 
methods. Because the experimentally determined 
structures were only released at the meeting, it was 
possible to test the modeling methods objectively. A 
summary of all comparative models submitted to 
CASP2 can be found elsewhere in this issue (A.C.R. 
Martin et al.). 
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We submitted homology-derived models of three 
proteins: DFRi, PTE2A3, and UBC9-24; the subscript 
indicates the target sequence number assigned by 
the organizers of CASP2. All three structures have 
been determined by X-ray crystallography: DFRi at 
2.6 A resolution and R factor of lS9c (U. Pieper and 
O. Herzberg, in preparation), PTE2A3 at 2.4 A 
resolution (K. Huang and O. Herzberg, in prepara- 
tion) and UBC924 at 2.0 A resolution and R factor of 
16% (H. Tong and T. Sixma, in preparation). These 
three target sequences were chosen because they 
have a relatively low, <43% sequence identity with 
their templates. In this range of sequence similarity, 
the largest errors in comparative modeling due to 
misalignments begin to appear. ^-^ important to 
concentrate on this range of sequence similarity 
because most of the detectable related sequence- 
structure pairs are related at less than 40% sequence 
identity level,^ despite earlier indications to the 
contraiy.^ 

Our approach to comparative protein structure 
modeling is based on satisfaction of spatial re- 
straints and is implemented in program Modeller. 
This program can be used in all stages of typical 
comparative modeling: Finding suitable template 
structures in the PDB « aligning them with the 
sequence to be modeled, calculating the 3D model, 
and evaluating the model. Comparative protein mod- 
eling was recently reviewed. '^•^ 



tModeller is available at URL httpy/guitar.rockefeller.edu: 
pub/modeller and also as part of Quanta and InsightU (MSI, 
San Diego, CA. E-mail: blp@msi.com). 



Abbreviations: DFRi, Maloferax volcanii dihydrofolate reduc- 
tase; PTE2A3, Mycoplasma capricolum phosphotransferase 
enzyme IIA domain; UBC924. mouse/human UBC9 protein; 
NMR, nuclear magnetic resonance; PDB, Brookhaven Protein 
Data Bank; RMSD, root-mean-square deviation; 3D, three- 
dimensional; CASP, critical assessment of techniques for pro- 
tein structure prediction. 
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In this article, we briefly describe the modeling 
method and then concentrate on evaluation of the 
three submitted models. In particular, we discuss 
the question of whether or not the models are 
generally closer to the X-ray structure being modeled 
than the template structures. 

METHODS 

The first step in comparative modeling of the three 
target proteins was identification of potential tem- 
plate structures. This was followed by several cycles 
of template selection, target-template alignment 
model buildmg, and model evaluation. The aim of 
the iteration was to minimize the errors in the model 
reported by various model evaluation techniques. 
This Iterative process, including careful manual 
selection of the templates and editing of the ahgn- 
ments, is the main difference between the current 
approach and that followed two years ago for the 
CASPl meeting.2 The final alignments and 3D mod- 
els are available from the CASP2 Web site at URL 
http://PredictionCenter.llnl.gov/CASP/CM_resultsA 
Template Selection 

Proteins that have known 3D structure and are 
similar to the sequences being modeled had to be 
identified. This was achieved by searching a set of 
sequences representative of the whole PDB (July 1 
1996) [6], using the SEQUENCE.SEARCH com- 
mand of Modeller.9 The representative set of protein 
structures included 916 chains whose sequence iden- 
tity was less than 30% to any other chain in the set. 
The final templates were as follows: For DFR. 
4DFR.B (30%, 1.4 A, 91%), 3DFR (24%, 1.5 A, 93% ' 
and 8DFR (22%, 1.6 A, 94%); for PTE2A3 , IGPR 
(43% 1.3 A , 94%) and 1F3G (36%, l.l A , 94%); and 
for UBC924, lAAK (35%, 1.1 A, 90%) and 2UCE 
(30% 1.2 A, 90%). The numbers in the parentheses 
are the percentage sequence identity, RMSD for Cot 
atoms, and the fraction of the equivalent Ca atoms 
These were all obtained from pairwise template- 
target least-squares superpositions with a 3 5 A 
cutoff, 

Target-Template Alignment 

Initial multiple template-target alignments were 
obtained by aligning the target sequences with the 
preahgned template structures, using the ALIGN2D 
command of Modeller.^ This command implements a 
global dynamic programmingio algorithm with a 
variable gap-penalty function that depends on the 
structural context of an insertion or a deletion (R 
Sanchez and A. SaU. in preparation). The gap pen- 
alty is constructed such that insertions and deletions 
are less preferred within helices and sheets, buried 
regions, straight segments, and also between two 
residues that are distant in space. The alignments 
als^o depended on a 20 x 20 amino acid residue 
substitution matrix that was derived from 105 struc 



ture-structure alignments.-" The initial calcuU 
alignments were edited by hand as appropriate ( 
below). 

Model Building 

The 3D models containing all nonhydrogen ate 
were obtained automatically bv satisfying restrai 
on many distances, angles, and dihedral angle 
Spatial restraints were extracted from the ali 
inent of the target sequence with the temp] 
structures-*- and from the Charmm-22 force fiel 
The whole model, including backbone, side cha-: 
loops, and insertions, was build in one optimizati 
Conformation of the regions aligned with the U 
plates was based mostly on the template structui 
while the insertions were restrained mostly by 
preferences of the different residue t^■pes for 
different areas of the Ramachandran plot. 

Model Evaluation 

The models had to satisfy most restraints used 
calculate them, especially the stereochemical 

^^^^ ^^^^^ ^^^^ by the Mode] 

ENERGY command,^ the Procheck program ^2 g 
the WhatCheck program.i^ The most imports 
evaluation was done by "energy" profiles calcula^ 
by Prosall, which relies on statistical potenti 
mvolving single residues and pairs of residues 
Additional evaluation was done by "energy" profi 
calculated from a new set of statistical potenti. 
mvo ving pairs of atoms.i^ Side chain packing 
checked by calculating cavities in the core of 
f/^of "^'""^ Quanta Protein Health modv 
(MSI. San Diego, CA). If any of the model evaluati 
tools indicated an error in the model, the model w 
changed manually. For example, side chains we 
manually repositioned to eliminate a cavity in t 
core. Another example is a selection of differe 
teinplates and editing of the alignment around t 
region with a bad Prosall profile, followed by a 
other round of the automated model building. 

RESULTS AND DISCUSSION 
Although the DFRi. UBC%,, and PTE2A3 modf 
Have good stereochemistry, they have errors in fo 
other categories: Distortions or shifls of a region th 
is aligned correctly with the templates 'e.g., looi 
helices, strands;; errors in side chain packing; distc 
tions or shifts of a region that does not have j 
equivalent segment in any of the templates (e^ 
inserted loops;; and distortions or shifts of a regii 
that is aligned incorrectly with the templates fe.^ 
loops and larger segments with low sequence ide 
tity to the templates). Examples of these errors a 
described in the following sections. We also discu. 
the lessons learned from this experiment with r 
spect to automated template mimicking in differe: 
regions of a model; the cycle of template selectio. 
alignment, model building, and model evaluation 
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and the relative overall similarity of a model and the 
templates to the target X-ray structure. 



and the X-ray structures are due to the mistakes in 
the modeling procedure.— 



Stereochemistry of the Models 

The stereochemical features of the models, such as 
those evaluated by the Procheck^^ and WhatCheck^^ 
progi-ams, are comparable to those in the high 
resolution X-ray structures. These features include 
bond lengths, angles, improper dihedral angles, posi- 
tion of residues in the Ramachandran plot, peptide 
bonds planarity, Ca tetrahedral distortion, non- 
bonded interactions, hydrogen bond energies, and 
closeness of side chain dihedral angles to ideal 
values. It is not surprising that the models are 
stereochemically correct since they were calculated 
partly by optimizing the stereochemical features as 
encoded in the Charmm-22 force field." 

Errors in Side Chain Packing 

The side chadn rotamers were predicted surpris- 
ingly inaccurately. For example, the percentage of xl 
angles for DFRi, PTE2A3, and UBC924 predicted 
within 30* of the target values was 42%, 48%, and 
65%, respectively. Since at least the UBC924 X-ray 
structxu-e has been refined at a high resolution of 2 A 
and an R factor of 16%, the low prediction accuracy 
must reflect significant problems with our side chain 
modeling procedure in this range of backbone and 
side chain similarities. However, the mistakes made 
were not trivial because the models followed their 
templates for conserved and similar side chains, 
because the model rotamers were not distorted, and 
because the cavities in the models were not larger 
than those in the X-ray structures. It is not clear 
what kind of improvements are needed beyond a 
self-evident need for a more accurate energy function 
and perhaps a better optimizer. 

The difficult^ of the side chain modeling problem 
in this range of sequence similarity is illustrated by 
the fact that the template and target X-ray struc- 
tures have different rotamers for up to 45% of the 
conserved residues. For example, DFRi has 125 
residues with at least one side chain dihedral angle, 
29 of which are conserved in one of the templates 
CPDB code 3DFR), but 12 of these occur in different 
rotamer states. A systematic analysis of this phenom- 
enon, based on highly refined structures, would be 
useful. If the target and template X-ray structures 
are accurate and the finding proves to be general, 
this indicates that the side chains should be modeled 
on the basis of more general physical principles ^^^^ 
rather than by mimicking the templates,^*'*^* espe- 
cially when the backbones of the target and the 
template have an RMSD larger than 2 A. An addi- 
tional complication for the evaluation of side chain 
models is that for the two targets refined at a low 
resolution of 2.6 A (DFR,) and 2.4 A fPTE2A:j), it is 
not clear that all the differences between the models 



Distortions or Shifts in Correctly Aligned 
Regions: Template Mimicking in Different 
Regions of a Model 

For all three models, at least two template struc- 
tures were used. Thus, it was possible to determine 
how frequently the automated model building se- 
lected the best template for a given segment where 
the templates shared different degi ees of structural 
similarity with the target structure. The ability to 
pick locally optimal templates is important because 
it allows the model to be overall closer to the correct 
structure than any of the individual templates. 

The distances of the positions of the Ca atoms of 
the model and the templates from the equivalent 
atoms in the superposed target X-ray structure are 
shown for DFRi and UBC924 in Figure 1. For the 
correctly aligned regions, the model always follows 
one of the templates. When two templates differ in a 
given correctly aligned region, the model generally 
follows the template that is structurally closer to the 
experimental structure: Six such segments of at 
least three residues with distances between the 
templates of at least 1 A occur in the DFRi and 
UBC924 models. For the correctly aligned regions, 
there are no examples of the model following a 
suboptimal template. As a consequence, the model is 
generally closer overall to the experimental struc- 
ture than any of the templates (see also Fig. 4). 
However, for a given region, model building does not 
result in a model that is better than the best 
template in that region (Fig. 1). 

These observations are a direct consequence of the 
form of the homology-derived distance restraints.-*-^ 
The restraints are expressed as probability density 
functions. When several templates are aligned with 
a given segment in the target sequence, a restraint 
on an inter- or intrasegment distance has a multimo- 
dal shape with the peaks corresponding to the equiva- 
lent distances in the templates, not to the average 
distance. The heights and the widths of the peaks 
are determined by the overall and local sequence 
similarities between the templates and the target 
sequence, such that the model is most likely to 
resemble the template with the most similar se- 
quence. This means that the model is generally 
closer to one or the other template by construction. 
In order to allow for the modeling of distortions or 
shifts relative to the template structures, a scoring 
function that guides the model in the correct direc- 
tion from the template to the tar^'ct structure is 
necessary. A combination of homology-derived re- 
straints with atom based stati.stical potentials*'^-^'^^'^ 
is perhaps one way of achieving this aim. 
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,^^^\ ^Backbone errors in the UBCQj^ (a) and DFR. fbcl 

ShT^h "^'^ ^"i"^^ •^'"P'^'^« ^'^ superposed as rigid 
bod.es on the corresponding target structures using a cutoff of 3 5 
A for the equivalent C« atoms. A curve in (a) and (b) s^ws the 

equivalent atoms in the target, (a) UBC9j, model - target cominu 
l^'y^'^ 2UCE-target. dashed tine; tem^lte ?AAK- 
n1I?»';nco^f ""^ '^^ ""odeMarget. conlinuous finl Z.. 
Plate 40FR.B-target, dashed //r,e; template 3£}FR-targerdo«ed 




105 

cCS anoned s'en°:""r"? """^ ^"^"^ indicate the 

nrn °' ^Sidues 45-60 and 105-115 of the 

thriTa^tr—' 'Vr''"^'''"^ '^9'°- - ' .e-S and 
structure /^^^^^ The model, thick continuous line: X-ray 
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laak APQDN NIM 

2uce GPVGD DLY 



UBC9^. VPTKNPDGTMNLM 
24 



laak ILQN--OWS 
2uce ILKD--QWS 
VBC9^^ ILEEDKDWR 



Fig. 2. Errors in the two UBC924 loop models. The loops 
corresponding to the two insertions in the UBC924 rnodel {continu- 
ous thick line) are shown superposed with the target X-ray 
structure (continuous thin line), the templates lAAK (dotted line). 
and 2UCE {dashed line). The numbers indicate the beginning and 
ending residues of each segment in UBC924. The corresponding 
regions of the modeling alignment are shown below each set of the 
structures. 



Errors in Loops 

There were only two insertions in the three mod- 
els, both of them in UBC924 (Fig. 2). The longest 
insertion was only five residues long (residues 40— 
44), and the second insertion was two residues long 
(residues 108-109). When the whole model was 
superposed on the X-ray structure, the RMSD be- 
tween the backbones for the five-residue loop was 6.7 
A; when the backbones of only the two loops were 
superposed locally, the RMSD was 1,7 A. Thus, both 
the orientation and conformation of the predicted 
loop were incorrect. The large difference between the 
two numbers shows that the positioning of the loop 
relative to the rest of the protein can be a very 
important contributor to the total error even in the 
case of relatively short loops. The alignment in the 
neighborhood of the loop was correct, except perhaps 



for the alignment of residue 39, which probably 
should not have been aligned with any residue in the 
templates (Fig. 2). The RMSD for the backbones of 
the three residues preceding (37-39) and the three 
residues following the loop (45-47) was 2.3 A and 1.5 
A for the global and local superposition, respectively. 
The average backbone isotropic temperature factors 
for the five- and two-residue insertions were 24.4 A- 
and 22.2 A^, respectively, compared to the slightly 
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Fig. 3. Alignment problems and solutions, (a) Alignment of the 
N-terminal region of UBC92«. The alignment used for model 
building (modeling) and the correct alignment derived from the 
superposition of the experimental structures of the templates and 
the target {3D) are shown. The Prosall energy profiles for the 
model {continuous line) and the target X-ray structure {dashed 
fine) are shown below the alignment. Note the lower energy of the 
X-ray structure in the misaligned region, (b) The correct and 
alternative alignments for the C*terminal region of DFR,. The 
Prosall energy profiles for the corresponding 3D models are 
shown below the alignment. The model based on the correct DFR^ 
alignment, continuous line, the model based on the alternative 
alignment, dashed fine. Note the positive energy for the alternative 
model in the C-terminat region, (c) Superposition of theC-terminal 
region of the correct (continuous tine) and alternative model of 
DFR, {dashed tine) with the X-ray structure (thin fine). 
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lower average of 16.4 A- for the backbone of the 
whole protein. Thus, if the loops are not in contact 
with other protein molecules in the crystal, it is 
likely that the differences between the insertions in 
the crystal stinicture and the model reflect errors in 
the model. 

Distortions or Shifts in Incoixectly Aligned 
Regions: The Cycle of Alignment, Model 
Building, and Model Evaluation 

In the three models, there was only one secondary 
structure segment that was misaligned, the N- 
terminal helix of UBC9o4. In addition, there were 
three, zero, and one gaps in the modeling alignments 
for DFRi, UBC924, and PTE2A3, respectively, where 
one or a few residues were misaligned. 

In the UBC924 model, the N- terminal segment of 
11 residues was misaligned by one position, which 
resulted in large errors in the model (Fig. la). This 
misalignment was unexpected because the correct 
alignment corresponded to a significantly lower se- 
quence similarity between the target and the tem- 
plate (Fig. 3a). For example, the number of matches 
between hydrophobic residues* is decreased and the 
number of matches between hydrophobic and polar 
residues is increased when the incorrect alignment 
is corrected. The misalignment was not detected by 
the Prosall profile of the model (Fig. 3a). However, 
the comparison of the profiles for the X-ray structure 
and the model shows that the X-ray structure has a 
lower Prosall score in that region (Fig. 3a). This 
suggests that the search for the alignment with the 
lowest Prosall profiles of the implied model could 
conceivably result in the correct alignment and thus 
a significantly better model in this case. 

Another interesting observation is that the overall 
sequence identity between the target sequence and 
the more similar of the two templates dropped from 
39% to 35% for the correct alignment. This makes 
the point that optimizing only sequence similarity is 
not always best in comparative modeling. 

In the DFRi model, it was obviously difficult to 
align the last 13 residues, corresponding to the last 
strand of the last P hairpin (Fig. 3). Two plausible 
alternative alignments were generated manually by 
taking into account local sequence similarity, second- 
ary structure predictions for DFRi,^'^^^ and the 



Fig. 4. Similarity curves for the DFR, (a, b, c) and UBC924 (d, 
e) modeis and templates. See the Methods section for the 
definition of the similarity curves, (a and d) The optimal superposi- 
tion of the templates and the X-ray structure was used to define the 
equivalent residues, (b. c. and e) The modeling alignment was 
used to define the equivalences between the templates and the 
target, (a), (b). (d). and (e). only the Cu atoms are used to calculate 
RMSO. (c) Ail atoms are used to calculate R(s/1SD. Model-target. 
thick continuous tine: template 4DFR-B-targetand template 2Uc5e- 
largel. dashed line: template-target and template lAAK-larget. 
dotted line, template 8DFR-iarget, thin continuous line. 



structures of the template proteins. The alignments 
were evaluated by comparing the Prosall profiles of 
the models based on those alignments (Fig. 3b). One 
of the models had a positive profile, and the other one 
had a negative profile at the C terminus. A compari- 
son of the two models with the X-ray structure 
showed that the model with the negative profile was 
indeed correctly aligned with the template (Fig. 3c). 

As illustrated above, alignment errors are a major 
source of large errors in comparative models. We 
attempted to overcome this limitation by iterating 
through several cycles of careful manual template 
selection and alignment, followed by automated 
model building and model evaluation. This process 
was guided by a reduction in the errors predicted by 
a number of model evaluation techniques, most 
importantly the "energy" profiles calculated by the 
Prosall program and a program of Melo and Feyt- 
mans.^^ Despite our limited experience, we believe 
that evaluation of an alignment at the level of the 
implied model is likely to overcome a significant 
fraction of initial alignment errors, especially when 
better potential functions for model evaluation be- 
come available and when the iterative procedure is 
automated so that a larger number of alternative 
alignments can be explored.^® 

Overall Accuracy of the Models: Relative 
Overall Similarity of a Model and the 
Templates to the Target X-ray Structure 

We now wish to answer the question of whether 
the predicted structures are a better model of the 
experimental structures than the templates used in 
the calculation of the models. In other words, how 
much closer is a comparative model of the target 
sequence to the target X-ray structure than the 
closest template structure? 

Although a single RMSD value is useful for measur- 
ing a difference between two relatively similar struc- 
tures, RMSD depends on the number of equivalent 
atom pairs that are compared, which in turn de- 
pends on the maximal allowed distance between two 
equivalent atoms. This makes a single RMSD value 
inconvenient for comparing differences between pairs 
of different proteins. One solution to this problem is 
to define a similarity curve for a pairwise structure- 
structure comparison by plotting RMSD as a func- 
tion of the number of equivalent atoms. The similar- 
ity curve is obtained by calculating RMSD at different 
cutoff values for equivalencing intermolecular pairs 
of Ca atoms and plotting the resulting RMSD values 
against the number of equivalent positions obtained 
at each cutoff. Two similarity curves, instead of two 
single RMSD numbers, can then be inspected for a 
comparison of two protein-protein matches. 

The similarity cui-ves for the three pairwise com- 
parisons of the DFRi model and the two templates 
with the target structure are plotted in Figure 4a. 
The curves show that over a large range of the 
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number of equivalent atoms, the model is slightly 
closer to the experimental structure (lower RMSD 
value) than either of the two templates. In other 
words, at a fixed number of atoms compared, the 
model atoms have a lower RMSD fiom the X-ray 
structure than the template atoms; conversely, at a 
fixed RMSD. the model has more atoms equivalent to 
the X-ray structure than either of the templates. 
However, the differences are small, <10% over most 
of the similarity range. 

Errors in the positioning of three gaps in the DFRj 
modeling alignment contributed to the similarity 
curve for the model-target comparison, but not to the 
template-target similarity curves in Figure 4a, which 
were obtained from the superposition of the crystallo- 
graphic structures. In order to eliminate the contri- 
bution of the alignment errors and evaluate the 
model building procedure on its own, the similarity 
curves were recalculated using the modeling ah^- 
ment for comparison of the templates with the target 
structure (Fig. 4b). Since the template-target com- 
parisons now include the alignment en-ors, the tem- 
plates are less similar to the target X-ray structure 
than in Figure 4a. However, the difference in how 
representative of the target structure are the model 
and the templates is still small, on the order of 10% 
of RMSD. 

When side chain atoms were included in the 
calculation of the similarity curves, the DFRj model 
became an even better representation of the target 
structure relative to the templates (Fig. 4c). For 
example, the model had approximately 95% of its 
atoms superposed with an RMSD from the target 
structure of 2 A, while the closest template only had 
78% of the atoms at that level of similarity (Fig. 4c) 
This was expected because the templates do not 
share all the side chain atoms with the target 
structure while the model does. 

In contrast to DFR^, the UBC924 model is worse 
than the best template because of the alignment 
errors, primarily the shift for one position of the 
N-terminal 11 residues (Fig. 4d), The PTE2A3 model 
IS as close to. the target structure as the best tem- 
plate (data not shown). 

All comparative modeling methods start with an 
alignment of the target sequence with the template 
structures, followed by model building that is de- 
coupled from the alignment procedure. Therefore 
when evaluating comparative modeling methods, it 
IS important for method developers to distinguish 
between errors due to misalignments and errors due 
to the model building procedure. This distinction is 
also important for the method users because the 
modelmg ahgnment, not the correct aHgnment, would 
be used to extract information from the template 
structure in the absence of any model building 
When the modeling alij^nmcnt is used to compare 
both the model and the templates with the target 
structure, all three models are a better representa- 



tion of the experimental structure than the tern 
plates used in their derivation (Fig. 4b.e; data no 
shown for PTE2A:,l This is especially true when th 
side chain as well as backbone atoms are compare, 
(e.g.. Fig. 4c). These comparisons suggest that it i 
better to use a comparative model of the tai-get thai 
homologous structures, unless only coarse predic 
tions are made. 

CONCLUSIONS 

o A^oo."'^'^^^^ improvement in our models relative t. 
OASPl probably originates from the careful manua 
selection of the templates and editing of the alion 
ment as well as from the iterative re alignment anc 
model building. This suggests directions for futun 
development of the algorithms that will, it is hoped 
result in larger increases in the model accuracy. ^-29-3: 
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